You have no excuses now, use this free credit to launch your projects now on Digital Ocean, you're free to spend it whenever you want within the following 60 days.
Table of contents
Fix slow queries in Django when using annotate and subqueries
Fix slow queries in Django when using annotate and subqueries
Django’s ORM is quite useful and versatile, it can perform most of the common SQL operations, such as filtering, partitioning, joins or sorting information, creating aliases, but it also has its limitations, especially when combined with subqueries, today I’ll tell you about one of its limitations and how to solve it.
Despite its few weaknesses, its ORM is one of the reasons why you should use Django .
Django annotate and subqueries, a performance problem
The Django annotate function, which I already told you about in a post where I explain the differences between annotate and aggregate in Django, is used to add information to a SQL query, this information can be some data’s average, a sum or anything else you want, the thing is that, once you start using subqueries in annotate, things can get really messy.
Let me give you a more practical example:
from django.db.models import F
from django.db.models.expressions import Subquery
first_subquery = Subquery(...)
second_subquery = Subquery(...)
queryset = YourModel.objects.annotate(first_annotation=first_subquery)
.annotate(second_annotation=second_subquery)
.annotate(
third_annotation=F("first_subquery") - F("second_subquery"))
.annotate(
fourth_annotation=((F("first_subquery") - F("second_subquery")) / F("second_subquery"))
)
The problem here arises when we combine subqueries with annotate, and then proceed to use those annotations to fill other annotations. Yo dawg, I heard you like annotations so… you get the idea, right?
Django does not have the ability to recognize that it is using the subqueries over and over again, so the SQL it generates repeats the same subqueries over and over again, resulting in a poorly performing query; yes, we’re prey of the famous n+1 queries problem.
SQL generated by Django using annotate and subqueries is inefficient.
But it becomes worse, where exactly is the problem? The django ORM translates the above queryset into the following SQL query:
SELECT columns
(SELECT ...first_subquery - SELECT ...second_subquery) AS "third_annotation",
(SELECT ...first_subquery - SELECT ...second_subquery)/(SELECT ...first_subquery) as "fourth_annotation",
(SELECT ...first_subquery) as "first_annotation",
(SELECT ...second_subquery) as "second_annotation"
FROM table_a LEFT OUTER JOIN table_b
ON table_a.id = table_b.id
GROUP BY table_a.id ...
Notice how Django is reusing the SQL from each subquery multiple times during the query, instead of performing the query once and then reusing that value.
If you don’t know how to get the SQL query that Django’s ORM generates, I remind you, qs represents your queryset:
print(qs.query)
How to fix this mess? Well, one of the ways to fix this SQL query is to use Common Table Expressions (CTEs), however, as of this writing, Django does not support Common Table Expressions (CTEs), so we will have to use a raw query instead of the methods already provided by the Django ORM.
Use Common Table Expressions (CTEs) to improve annotate and subqueries performance.
The solution is to create a raw query, but it’s not that bad as it looks, remember that modern versions of django allow you to use the raw method of your model manager so that Django automatically assigns it to a queryset object of your respective model.
qs = YourModel.objects.raw("YOUR_SQL_RAW_QUERY_GOES_HERE")
The SQL query with the Common Table Expressions (CTEs) that we will use would look like this:
WITH my_cte AS (
SELECT
a.column
(SELECT ...subquery_one) AS first_annotation,
(SELECT ...subquery_two) AS second_annotation
FROM table_a
LEFT OUTER JOIN table_b
ON table_a.id = table_b.id
GROUP BY table_a.id ...
)
SELECT
columns,
first_annotation,
second_annotation,
first_annotation - second_annotation AS third_annotation,
(first_annotation - second_annotation)/first_annotation AS fourth_annotation
FROM my_cte;
Now, look! As you can see the subqueries are in parentheses and each of them appears only once.
Using Common Table Expressions (CTEs) allowed to use an efficient query, avoiding multiple repetitive queries to the database and will give us a performance that outperforms the Django ORM’s query by several orders of magnitude (I managed to reduce a few queries’ execution time from 13 seconds to just 0.7 seconds).
Until Django adds support for CTEs we are doomed to implement CTEs manually instead of using the ORM. Perhaps implementing CTEs is one of the actions that can be taken to improve Django’s framework.