Filtered queries in Elasticsearch

Queries in Elasticsearch can be executed not only against full-text searches. They can also be filtered. And in Elasticsearch world, filters mean another operation than queries.

This article will describe how to use filters to reduce the number of returned document and adapt them into expected criteria. It consists on 3 parts. The first one explains the purpose of filters in queries. It also lists some of principal filters. The second part describes some main differences between filters and queries. The last part presents how to implement filters in queries built with Elasticsearch Java API.

Filters in Elasticsearch

Filters, as the name indicates, serves to execute operations which results will be true or false. In other words, they help to eliminate documents which shouldn't be returned in response, and so analyzed by query. They are pretty well adapted to matches against exact values (unlike queries, used more for full-text searches).

In the list of filters implemented in Elasticsearch, we can find among others:

Evidently, this list is not exhaustive. There are many other filters adapted to different use cases. We can for example find filters to deal with geographical constraints. Other ones to work in parent-children document mappings. There are even a filter regex used to filter against defined regular expression.

Another powerful feature of filters is caching. Results of some of previously listed filters are, by default, cached. Thanks to it, when executed again, they are able to provide response very quickly.

Differences between filters and queries

Compared to queries, filters can be thought as operations for controlled and exact matches. In the other side, queries, are more SQL like-based matches which content is difficult to predict. The predictability difference consists more precisely on fact that users can introduce "1" or "one" in filter while in query some users can write "1" and the others "one" or even "first". So typically, filters are better in the case of exact matches while queries are destined to deal with full-text searches (containing a word, best matching given word.

Another difference is that filters aren't based on scoring. They really help to detect which document doesn't respond correctly to criteria but they can't respond how well remaining documents match filters. Queries, as we mentioned in one of previous articles, are based on scoring, i.e. how relevant to given query are returned document.

In many scenarios, filters perform better. They can generate correct responses quicker thanks to previous executions caching.

Note however that filters and queries can be mixed together. It means that filter context can be placed inside query context and conversely.

Filters example with Elasticsearch Java API

In our sample project we use very often bool filters to match documents against two distinct criteria. It's useful in the case when we want to match specific team matches by knowing that this team can appear as host team or guest team as well. To facility the reuse of filters, shared filters are listed in QueryFilters utility class (only distinct queries are listed below) :

public final class QueryFilters {

  public static TermFilterBuilder team(String teamName) {
    return FilterBuilders.termFilter("team", teamName);
  }

  public static RangeFilterBuilder hostGoals(int goals, RangeModes rangeMode) {
    return rangeMode.get(FilterBuilders.rangeFilter("hostGoals"), goals);
  }

  public static BoolFilterBuilder guestOrHostTeam(String teamName) {
    return FilterBuilders.boolFilter()
      .should(QueryFilters.hostTeam(teamName), QueryFilters.guestTeam(teamName));
  }

  public static TermsFilterBuilder seasonsIn(Collection<String> seasons) {
    return FilterBuilders.termsFilter("season", seasons);
  }

}

Filters, as the rest of query elements in Elasticsearch Java API, are constructed with specific builders. They are all set in org.elasticsearch.index.query.FilterBuilders abstract class. We can find there filter builders for, among others: regexp filter, terms filter, range filter or bool filter. After, all specific builders have the same properties as we can use through DSL queries. Their use in queries looks like:

FilterBuilder filterBuilder = FilterBuilders.boolFilter()
  .should(
    FilterBuilders.boolFilter().must(
      QueryFilters.hostTeam(teamName),
      QueryFilters.hostGoals(scoredGoals, RangeModes.GTE)
    ),
    FilterBuilders.boolFilter().must(
      QueryFilters.guestTeam(teamName),
      QueryFilters.guestGoals(scoredGoals, RangeModes.GTE)
    )
  );

SearchResponse response = index.scores()
  .setQuery(QueryBuilders.filteredQuery(
    QueryBuilders.matchAllQuery(),
    filterBuilder
  ))
  .get();

We create here a new instance of object representing filtered query. This kind of query is used to exclude not matching documents from response. It's different from root level filters because it allows some optimizations from the side of Elasticsearch. In the case of filtered query, Elasticsearch can exclude not corresponding documents first and return them after by computing relevancy score and other stuff. In additionally, this result can be cached. In another side, root level filters applies the filters after returning documents, so after computing scores etc. So potentially it will be slower than filtered query.

This time we discovered how to eliminate some documents for Elasticsearch results thanks to filters. At the begin we listed some of popular filters in this search engine. After we explained the main differences between filters and queries, mostly related to performances. The last part described how to define filters and use them in Elasticsearch Java API queries.

If you liked it, you should read: