You have no excuses now, use this free credit to launch your projects now on Digital Ocean.
Table of contents
Trigrams and advanced searches with Django and Postgres
Trigrams and advanced searches with Django and Postgres
What if a user’s finger slips on the keyboard and types “parfume” instead of “perfume”. We probably don’t want our user to leave the site because he didn’t find any “parfume” on our website. Our website should return the results that most closely match what they are looking for. See how an experienced ecommerce handles it:
Don’t you remember the basic searches in Django? I have a post about basic searches and full text search using Django and Postgres , if you haven’t read it take a look there first.
Trigrams in Django and Postgres
But how does our application know that when a user types “parfume” they probably mean “perfume”?
The reason why parfume resembles perfume is because both contain similar trigrams.
Trigram? Yes, trigram, of three and gram. **A trigram is three consecutive characters, it’s as simple as that.
Trigrams are three consecutive characters
Need to learn everything Django has to offer?
AdTo master Django you must understand how it works, how a HTTP request goes to the router, which pass the request data to the appropiate function (or class) and then how you can Django's ORM to interact with the database and return a response if you want a detailed explanation of each step, educative has a course on that, certificate included, and Python is covered (in case you don't know Python) link below.
SQL was the alpha and it will be the omega
AdIf you're starting with databases and backend, let me tell you something: forget about non-relational databases for now, start with SQL and thank me later. First earn how to perform SELECTS, INSERTS, UPDATES, DELETES, JOINS and all the other functions first, and if you're too lazy to look for a good, well structured and sequential tutorial check the ones on educative, certificate included.
Need to learn everything Django has to offer?
AdTo master Django you must understand how it works, how a HTTP request goes to the router, which pass the request data to the appropiate function (or class) and then how you can Django's ORM to interact with the database and return a response if you want a detailed explanation of each step, educative has a course on that, certificate included, and Python is covered (in case you don't know Python) link below.
Trigrams and similar words
According to Postgres , we can tell how similar two strings are by comparing the number of trigrams they share, and Django provides functions to work with trigrams.
Videogame.objects.filter(name__trigram_similar="automatta") # it should say automata
<QuerySet [<Videogame: Nier automata>]>
Videogame.objects.filter(name__trigram_similar="autommattaa") # it should say automata
<QuerySet [<Videogame: Nier automata>]>
#...FROM "videogame_videogame" WHERE UNACCENT("videogame_videogame"."name") % UNACCENT(autommata)
View the trigrams for the word “automata” directly from the postgres terminal
SELECT show_trgm('automata');
show_trgm
---------------------------------------------
{" a"," au",ata,aut,mat,oma,"ta ",tom,uto}
Now look at the trigrams for “automatta” (if you didn’t notice, this one has a double “t”).
SELECT show_trgm('autommattaa');
show_trgm
---------------------------------------------------------
{" a"," au","aa ",att,aut,mat,mma,omm,taa,tom,tta,uto}
Can you notice how they both share some trigrams (a, au, aut, mat, tom, uto)?
Also note that the quotation marks around certain trigrams are to specify trigrams with spaces.
The number of trigrams shared by a pair of text strings can be expressed by means of an index. The more trigrams shared, the higher this index will be.
We can find the similarity index, according to their trigrams, between two words from the postgres terminal.
SELECT word_similarity('outer worlds', 'wilds');
word_similarity
-----------------
0.1875
Sort by similarity with trigrams
What if we want our Django search to find even those words that match in a smaller number of trigrams?
Using the Django ORM with the TrigramSimilarity function we can filter those results by setting a similarity threshold between a search word and our data.
If you don’t remember what Django annotate is for, I have a post where I explain django annotate and aggregate, as well as their differences
from django.contrib.postgres.search import TrigramSimilarity
results = Videogame.objects.annotate(similarity=TrigramSimilarity('name', 'wilds'), ).filter(similarity__gt=0.1).order_by('-similarity')
<QuerySet [<Videogame: Outer wilds>, <Videogame: Outer worlds>]> # With a similarity index of 0.1, the words wilds y worlds match
results[0].similarity
# 0.5
# ...SIMILARITY("videogame_videogame"."name", wilds) AS "similarity" FROM "videogame_videogame" WHERE SIMILARITY("videogame_videogame"."name", wilds) > 0.1 ORDER BY "similarity" DESC
Search rank to sort by relevance
If a user searches for a laptop and your application first shows them laptop cases, laptop backpacks, other related items and then laptops, you are providing an inadequate user experience.
Search Rank allows you to sort user searches by relevance, so that your user finds exactly what they are looking for first and then everything else.
from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
vector = SearchVector('name')
query = SearchQuery('days')
results = Videogame.objects.annotate(rank=SearchRank(vector, query)).order_by('-rank')
results[0].rank
# 0.0607927
# ... ts_rank(to_tsvector(COALESCE("videogame_videogame"."name", )), plainto_tsquery(days)) AS "rank" FROM "videogame_videogame" ORDER BY "rank" DESC
Search Rank, with the help of the to_tsvector and plainto_tsquery function, will sort our search results according to the matches it finds between the vector and the query and **return each of the results of our query with a rank property that shows the value for its respective element.
Assign importance by field
In a search, not all fields should matter equally
Imagine you have a book database and a Book model with a title field and a content field. If a user searches for “magic” books, it would probably be correct to return books that contain “magic” in the title, such as “Magic spells and ceremonies” or “Magic rituals” or “History of magic”.
On the other hand, Harry Potter books also relate to user interests, as they mention the word “magic” multiple times in their description, however, you probably want your search to prioritize those that contain the word “magic” in the title, not in the description.
With Posgres the above is possible.
To do this, we assign a weight, weighting, priority or whatever you want to call it, in the form of a letter, to each search vector, together with the name of the field to which it corresponds, and we join them into one.
We can choose between the letters “A”, “B”, “C” and “D”. Each letter will have a different value of relevance in our search; “A” for the highest value and “D” for the lowest.
from django.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
vector = SearchVector('title', weight='A') + SearchVector('descripcion', weight='B')
query = SearchQuery('Magic')
Book.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by('rank')
Exactly they have the following values:
- D = 0.1
- C = 0.2
- B = 0.4
- A = 1.0
These values can be overwritten to suit your needs, according to the type of business and models you use.
Book.objects.annotate(rank=SearchRank(vector, query), weights=[0.1, 0.2, 0.3, 0.9]).filter(rank__gte=0.3).order_by('rank').filter(rank__gte=0.3).order_by('rank')
In the example above, I have rewritten the original values and decreased the values of the letters “D”, “C”, “B” so that they represent a much smaller percentage compared to the letter “A”.
Django Libraries for advanced searches
Perhaps your search needs are much more advanced than those provided by the Django ORM combined with Postgres. But, unless you are developing something that will revolutionize the search industry, someone has already gone through the same problem. There are generic solutions, such as Solr and Django-haystack , that save you from writing many, many lines of code. Some examples are: