Routing in Elasticsearch

If you've been worked with PHP frameworks like Zend or Symfony, you are certainly familiar with the concept of routing which is based on redirection of HTTP request to appropriated controller. Elasticsearch has similar feature, by the way, also called routing.

Looking for a better data engineering position and skills?

You have been working as a data engineer but feel stuck? You don't have any new challenges and are still writing the same jobs all over again? You have now different options. You can try to look for a new job, now or later, or learn from the others! "Become a Better Data Engineer" initiative is one of these places where you can find online learning resources where the theory meets the practice. They will help you prepare maybe for the next job, or at least, improve your current skillset without looking for something else.

👉 I'm interested in improving my data engineering skillset

See you there, Bartosz

Routing is related to documents indexation, so in the first part of this article we'll try to explain this concept and its influence to data retrieval. After we'll see how routing manipulations can influence search results.

What is routing in Elasticsearch ?

As we learned in previous articles about Elasticsearch, documents are stored in primary shards and, eventually, copied to replica shards. So, the search engine must know how to locate document in logical and not randomized way. Elasticsearch calculates primary shard of each document thanks according to following formula:

shard_number = hash(routing) % number_of_primary_shards

Where routing is arbitrary string often replaced by document _id field. Hash function generates a number from routing parameter and divides it by the number of primary shards in current index. By the way, it explains why we can't change primary shards number after index creation - every removal could make disappear several indexed documents.

As told, routing parameter is often replaced by document _id field. However, each Elasticsearch request (get, index, update, delete, bulk), has the possibility to specify routing parameter. Why we could do it ? It produces some additional logic to handle by application, so potentially complicate maintainability. The main reason explaining the usefulness of routes is precision. Normally, when Elasticsearch handles a search query, it sends this query to all index shards. But when it knows the routing value, it invokes only shard which stores searched documents. The example of implementation can be e-commerce system putting all user orders in specific shard thanks to routing by user id. Search can be routed as well for a single route as for multiple routes.

Manipulate routes in Elasticsearch

Let's make a simple test on index storing orders passed in year 2015. To do that, we'll route queries by months. But before, some basic mapping with very important routing parameter (http://localhost:9200/orders):

    {"_routing": {"required": true},
     "properties": {
       "month": {"type": "integer"},
       "title": {"type": "string"}

Unlike standard mapping, this one contains the definition of _routing object which is required for every indexed document. If missing, indexing will be rejected - as you can deduce, routing information is passed with HTTP queries, as query string parameter called routing=route1,route2.... To index some orders for January, we'll call http://localhost:9200/waitingforcode/order?routing=1. For February, the call will be http://localhost:9200/orders/order?routing=2. Some sample indexed documents can look like below (sent to http://localhost:9200/orders/_bulk):

{"index": {"_index": "orders", "_type": "order"}}
{"month": 1, "title": "Order1"}
{"index": {"_index": "orders", "_type": "order"}}
{"month": 1, "title": "Order2"}
{"index": {"_index": "orders", "_type": "order"}}
{"month": 2, "title": "Order1"}
{"index": {"_index": "orders", "_type": "order"}}
{"month": 1, "title": "Order3"}

Normally this query should provoke an exception because Elasticsearch doesn't know which routes apply to indexed documents:

RoutingMissingException[routing is required for [orders]/[order]/[null]]

To make it works, we should only add "_routing": "january|february" entry after "_type" parameter in index line of bulk query.

Now, let's check what happens if we search "Order#2" by routing query to January route and to February route, and to both of them:

Routing appears as interesting feature for grouping data. At the begin of this article we discovered that correctly defined routing can avoid sending query to shards which certainly don't hold searched document. After that we shortly presented how to implement routing in indexing phase and how to benefit of them in searching step.

If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!