Table of contents

Autocomplete, Suggestions and Related Content: Solr and Django

Autocomplete, Suggestions and Related Content: Solr and Django

Solr, together with Lucene, is an outstanding search engine that allows you to perform searches with advanced features. In this post I bring you a summary of some of the most interesting features of Solr and Django Haystack.

I assume you already have a configured django app with Solr, in case you don’t, check my previous post.

Behavior of default AND and OR searches

Haystack allows us to define a default behavior for all searches, either by joining terms with AND or OR operators. The default value is AND but you can modify it in the configuration file.

# settings.py
HAYSTACK_DEFAULT_OPERATOR = 'AND'
HAYSTACK_DEFAULT_OPERATOR = 'OR'

Search by specific field with Solr

If we want to limit our search to a specific field of the object we define as index we simply pass it as a parameter, together with the text string to search.

def vista(request):
    results = SearchQuerySet().models(<modelo>).filter(title='<query text>')

This will allow us to search in the title field for our search term.

Remember that Solr sorts results by relevance? Sometimes we want to increase the relevance in certain cases, for example: maybe you want the last term searched by your user to influence the search. For this we add the boost method and the relative importance value we want to give it.

Increment per search term

sqs = SearchQuerySet().boost('<término>', 1.2)

Increment per field

When we want Solr to give more or less importance to a given field when performing a search, we pass the parameter boost to our field.

# app/search_indexes.py

from haystack import indexes
from .models import Videogame

class VideogameIndex(indexes.SearchIndex, indexes.Indexable):
    # ... otros dcampos
    name = indexes.CharField(model_attr='name', boost=1.5)

This increment is only valid when filtering by the field to which the boost is applied.

SearchQuerySet().filter(SQ(content='<query>') | SQ(name='<query>'))

Solr search suggestions

To enable this feature we need to set this option in the HAYSTACK_CONNECTIONS variable in the Django configuration file.

# settings.py

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.solr_backend.SolrEngine',
        'URL': 'http://127.0.0.1:8983/solr/<nombre_del_núcleo>',
        'INCLUDE_SPELLING': True
    },
}

Search_index configuration

First we need to create a suggestion field, which will take its information from the default text field.

class VideogameIndex(indexes.SearchIndex, indexes.Indexable):
    # ...
    suggestions = indexes.FacetCharField()

    def prepare(self, obj):
        prepared_data = super().prepare(obj)
        prepared_data['suggestions'] = prepared_data['text']
        return prepared_data

In addition, we need to modify our configuration file solrconfig.xml and add the following settings

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
  <str name="queryAnalyzerFieldType">text_general</str>
  <lst name="spellchecker">
    <str name="name">default</str>
    <str name="field">text</str>
    <str name="classname">solr.DirectSolrSpellChecker</str>
    <str name="distanceMeasure">internal</str>
    <float name="accuracy">0.5</float>
    <int name="maxEdits">2</int>
    <int name="minPrefix">1</int>
    <int name="maxInspections">5</int>
    <int name="minQueryLength">4</int>
    <float name="maxQueryFrequency">0.01</float>
  </lst>
</searchComponent>

And replace the SearchHandler in the same file.

<requestHandler name="/select" class="solr.SearchHandler">
  <lst name="defaults">
    <str name="echoParams">explicit</str>
    <int name="rows">10</int>
    <str name="spellcheck.dictionary">default</str>
    <str name="spellcheck">on</str>
    <str name="spellcheck.extendedResults">true</str>
    <str name="spellcheck.count">10</str>
    <str name="spellcheck.alternativeTermCount">5</str>
    <str name="spellcheck.maxResultsForSuggest">5</str>
    <str name="spellcheck.collate">true</str>
    <str name="spellcheck.collateExtendedResults">true</str>
    <str name="spellcheck.maxCollationTries">10</str>
    <str name="spellcheck.maxCollations">5</str>
  </lst>
  <arr name="last-components">
    <str>spellcheck</str>
  </arr>
</requestHandler>

If the configuration is correct, we will be able to obtain suggestions for our searches as follows.

query = SearchQuerySet().auto_query("<mla ecsrito>")
query.spelling_suggestion() # u'mal escrito'

This allows us to correct small errors in the search, just as if we were using trigrams in Postgres and Django.

Activating this search took me a lot of work, it seems that it is not activated by default and that you have to visit the url with &spellcheck.reload=true to generate the proper index, but who knows, maybe in newer versions it won’t be necessary.

http://127.0.0.1:8983/solr/#/<instancia>/query?q=text:<termino>&q.op=OR&spellcheck=true&spellcheck.q=<termino>&spellcheck.reload=true

Exclusive and exact searches with Solr

Solr allows us to perform advanced searches using the auto_query method, which will allow a search syntax similar to that offered by google and other popular search engines.

  • Use a hyphen ("-") to exclude results that include those terms.
  • Use double quotation marks ("") to establish the correct order of words
query = SearchQuerySet().auto_query('<termino> -<excluye>')
otra_query = SearchQuerySet().auto_query('"<orden exacto>" -<excluye>')

Autocomplete with Solr

We can also perform an auto-completion of our search term, so that it will give us some valid suggestions.

For that we first need to create a new field in our class that serves as a search index.

We use any name and generally you will be using a field of type EdgeNgramField, in the same way, we declare the field of our model to which it will make reference, in this case name. The other option is an NgramField field but this is usually used for Asian languages.

class VideogameIndex(indexes.SearchIndex, indexes.Indexable):
    # ...
    name_autocomplete = indexes.EdgeNgramField(model_attr='name')
    # ...

Now in search

SearchquerySet().autocomplete(name_autocomplete='incompl')
# Devolverá resultados para: 'incompleto', 'incompletitud'

Sometimes we want to get more of the same item, this is ideal for product recommendations in online stores. For this there is the more_like_this method, to which we pass an instance of a Django model and it will return similar objects.

But to make it work, we will first add the handler to our solrconfig.xml file in our solr core configuration.

<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
  <lst name="defaults">
    <str name="mlt.mintf">1</str>
    <str name="mlt.mindf">1</str>
    <str name="mlt.minwl">3</str>
    <str name="mlt.maxwl">15</str>
    <str name="mlt.maxqt">20</str>
    <str name="mlt.match.include">false</str>
  </lst>
  </requestHandler>

It doesn’t matter how you get the instance, or what it is, the important thing is that you get a single instance and pass it as a parameter to the more_like_this method.

instance = Videogame.objects.get(name__icontains="<algo>")
# otro ejemplo: 
# instance = Videogame.objects.get(pk=5)
related = SearchQuerySet().more_like_this(instance)

This was just a summary of some of the most useful functions, for a complete list check the django haystack documentation.

Eduardo Zepeda
Web developer and GNU/Linux enthusiast always learning something new. I believe in choosing the right tool for the job and that simplicity is the ultimate sophistication. I'm under the impression that being perfect is the enemy of getting things done. I also believe in the goodnesses of cryptocurrencies outside of monetary speculation.
Read more