Table of contents

Fix slow queries in Django when using annotate and subqueries

Fix slow queries in Django when using annotate and subqueries

Django’s ORM is quite useful and versatile, it can perform most of the common SQL operations, such as filtering, partitioning, joins or sorting information, creating aliases, but it also has its limitations, especially when combined with subqueries, today I’ll tell you about one of its limitations and how to solve it.

Despite its few weaknesses, its ORM is one of the reasons why you should use Django .

Django annotate and subqueries, a performance problem

The Django annotate function, which I already told you about in a post where I explain the differences between annotate and aggregate in Django, is used to add information to a SQL query, this information can be some data’s average, a sum or anything else you want, the thing is that, once you start using subqueries in annotate, things can get really messy.

Let me give you a more practical example:

from django.db.models import F
from django.db.models.expressions import Subquery

first_subquery = Subquery(...)
second_subquery = Subquery(...)

queryset = YourModel.objects.annotate(first_annotation=first_subquery)
    .annotate(second_annotation=second_subquery)
    .annotate(
        third_annotation=F("first_subquery") - F("second_subquery"))
    .annotate(
        fourth_annotation=((F("first_subquery") - F("second_subquery")) / F("second_subquery"))
    )

The problem here arises when we combine subqueries with annotate, and then proceed to use those annotations to fill other annotations. Yo dawg, I heard you like annotations so… you get the idea, right?

Django does not have the ability to recognize that it is using the subqueries over and over again, so the SQL it generates repeats the same subqueries over and over again, resulting in a poorly performing query; yes, we’re prey of the famous n+1 queries problem.

SQL generated by Django using annotate and subqueries is inefficient.

But it becomes worse, where exactly is the problem? The django ORM translates the above queryset into the following SQL query:

SELECT columns
        (SELECT ...first_subquery - SELECT ...second_subquery) AS "third_annotation",
        (SELECT ...first_subquery - SELECT ...second_subquery)/(SELECT ...first_subquery) as "fourth_annotation",
        (SELECT ...first_subquery) as "first_annotation",
        (SELECT ...second_subquery) as "second_annotation"
    FROM table_a LEFT OUTER JOIN table_b
    ON table_a.id = table_b.id
    GROUP BY table_a.id ...

Notice how Django is reusing the SQL from each subquery multiple times during the query, instead of performing the query once and then reusing that value.

If you don’t know how to get the SQL query that Django’s ORM generates, I remind you, qs represents your queryset:

print(qs.query)

How to fix this mess? Well, one of the ways to fix this SQL query is to use Common Table Expressions (CTEs), however, as of this writing, Django does not support Common Table Expressions (CTEs), so we will have to use a raw query instead of the methods already provided by the Django ORM.

Use Common Table Expressions (CTEs) to improve annotate and subqueries performance.

The solution is to create a raw query, but it’s not that bad as it looks, remember that modern versions of django allow you to use the raw method of your model manager so that Django automatically assigns it to a queryset object of your respective model.

qs = YourModel.objects.raw("YOUR_SQL_RAW_QUERY_GOES_HERE")

The SQL query with the Common Table Expressions (CTEs) that we will use would look like this:

WITH my_cte AS (
    SELECT 
        a.column
        (SELECT ...subquery_one) AS first_annotation, 
        (SELECT ...subquery_two) AS second_annotation
    FROM table_a 
    LEFT OUTER JOIN table_b 
    ON table_a.id = table_b.id 
    GROUP BY table_a.id ...
)
SELECT 
    columns,
    first_annotation, 
    second_annotation, 
    first_annotation - second_annotation AS third_annotation,
    (first_annotation - second_annotation)/first_annotation AS fourth_annotation
FROM my_cte;

Now, look! As you can see the subqueries are in parentheses and each of them appears only once.

Using Common Table Expressions (CTEs) allowed to use an efficient query, avoiding multiple repetitive queries to the database and will give us a performance that outperforms the Django ORM’s query by several orders of magnitude (I managed to reduce a few queries’ execution time from 13 seconds to just 0.7 seconds).

Hasta que Django no añada soporte para CTEs estamos condenados a implementar CTEs en lugar de usar el ORM. Quizás implementar CTEs sea una de las acciones que se pueden tomar para mejorar el framework de Django.

Eduardo Zepeda
Web developer and GNU/Linux enthusiast. I believe in choosing the right tool for the job and that simplicity is the ultimate sophistication. Better done than perfect. I also believe in the goodnesses of cryptocurrencies outside of monetary speculation.
Read more