Elasticsearch: Building Autocomplete functionality

- 7 mins

What is Autocomplete ?

Let’s take a very common example. Whenever you go to google and start typing, a drop-down appears which lists the suggestions. Those suggestions are related to the query and help the user in completing his query.

Suggestions when typing on GoogleSuggestions when typing on Google

Autocomplete as the wikipedia says

Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing

It is also known as Search as you type or Type Ahead Search. It helps in navigating or guiding a user by prompting them with likely completions and alternatives to the text as they are typing it. It reduces the amount of character a user needs to type before executing any search actions, thereby enhancing the search experience of users.

AutoCompletion can be implemented by using any database. In this post, we will use Elasticsearch to build autocomplete functionality.

Elasticsearch is an open source, distributed and JSON based search engine built on top of Lucene.

Approaches

There can be various approaches to build autocomplete functionality in Elasticsearch. We will discuss the following approaches.

Prefix Query

This approach involves using a prefix query against a custom field. The value for this field can be stored as a keyword so that multiple terms(words) are stored together as a single term. This can be accomplished by using keyword tokeniser. This approach has some disadvantages.

Edge Ngrams

This approach involves using different analysers at index and search time. When indexing the document, a custom analyser with an edge n-gram filter can be applied. At search time, standard analyser can be applied. which prevents the query from being split.

Edge N-gram tokeniser first breaks the text down into words on custom characters (space, special characters, etc..) and then keeps the n-gram from the start of the string only.

This approach works well for matching query in the middle of the text as well. This approach is generally fast for queries but may result in slower indexing and in large index storage.

Completion Suggester

Elasticsearch is shipped with an in-house solution called Completion Suggester. It uses an in-memory data structure called Finite State Transducer(FST). Elasticsearch stores FST on a per segment basis, which means suggestions scale horizontally as more new nodes are added.

Some of the things to keep in mind when implementing Completion Suggester

This approach is the ideal approach to implement autocomplete functionality, however, it also has certain disadvantages

Implementation

Let’s implement the above approaches in Elasticsearch. We will be using Marvels movie data to build our sample index. For easy reference, here is the

We will be creating an index movies with type marvels.

If we see the mapping, we will observe that name is a nested field which contains several field, each analysed in a different way.

We will index all our movies by using

Let’s start with Prefix Query approach and try finding movie beginning with th.

Query will be

This will result in the following movie

The result is fair, but some movies like Captain America: The Winter Soldier, Guardians of the Galaxy are missed because prefix query only matches at the beginning of the text and not in the middle.

Let’s try finding another movie beginning with am.

Here we do not get any results, although Captain America satisfy this condition. This confirms the point that Prefix query cannot be used to match in the middle of the text.

Lets run the same search am but with Edge Ngram Approach.

{
  "query": {
    "match": {
      "name.edgengram": "am"
    }
  }
}

Here we get the following result

Let’s try finding for Captain America again, but this time with a bigger phrase captain america the

Using Edge N-gram approach, we get the following movies

If we observe our phrase, only the first two suggestion makes sense. The reason for so many terms getting matched is the functioning of match clause. match includes all the documents which contain captain OR america OR the. Since the field is analysed using ngram, more suggestions(if present) will get included as well.

Let’s try using the suggestion query for the same phrase captain america the . Suggestion query is written in a slightly different way.

We get the following movies as result

Let’s try the same query, but this time with a typo captain amrica the.

The above movie-suggest returns no result because no support for fuzziness is present. We can update the query to include support for fuzziness in the following way

The above query returns the following results

Conclusion

Various approaches can be used to implement autocomplete functionality in ElasticSearch. Completion Suggester covers most of the cases which are required in implementing a fully functional and fast autocomplete.

Taranjeet Singh

Taranjeet Singh

Full Stack Developer, Entrepreneur

comments powered by Disqus
rss facebook twitter github youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora