Routing in Elasticsearch

If you've been worked with PHP frameworks like Zend or Symfony, you are certainly familiar with the concept of routing which is based on redirection of HTTP request to appropriated controller. Elasticsearch has similar feature, by the way, also called routing.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

Routing is related to documents indexation, so in the first part of this article we'll try to explain this concept and its influence to data retrieval. After we'll see how routing manipulations can influence search results.

What is routing in Elasticsearch ?

As we learned in previous articles about Elasticsearch, documents are stored in primary shards and, eventually, copied to replica shards. So, the search engine must know how to locate document in logical and not randomized way. Elasticsearch calculates primary shard of each document thanks according to following formula:

shard_number = hash(routing) % number_of_primary_shards

Where routing is arbitrary string often replaced by document _id field. Hash function generates a number from routing parameter and divides it by the number of primary shards in current index. By the way, it explains why we can't change primary shards number after index creation - every removal could make disappear several indexed documents.

As told, routing parameter is often replaced by document _id field. However, each Elasticsearch request (get, index, update, delete, bulk), has the possibility to specify routing parameter. Why we could do it ? It produces some additional logic to handle by application, so potentially complicate maintainability. The main reason explaining the usefulness of routes is precision. Normally, when Elasticsearch handles a search query, it sends this query to all index shards. But when it knows the routing value, it invokes only shard which stores searched documents. The example of implementation can be e-commerce system putting all user orders in specific shard thanks to routing by user id. Search can be routed as well for a single route as for multiple routes.

Manipulate routes in Elasticsearch

Let's make a simple test on index storing orders passed in year 2015. To do that, we'll route queries by months. But before, some basic mapping with very important routing parameter (http://localhost:9200/orders):

{"mappings":
  {"order":
    {"_routing": {"required": true},
     "properties": {
       "month": {"type": "integer"},
       "title": {"type": "string"}
     }
    }
   }
}

Unlike standard mapping, this one contains the definition of _routing object which is required for every indexed document. If missing, indexing will be rejected - as you can deduce, routing information is passed with HTTP queries, as query string parameter called routing=route1,route2.... To index some orders for January, we'll call http://localhost:9200/waitingforcode/order?routing=1. For February, the call will be http://localhost:9200/orders/order?routing=2. Some sample indexed documents can look like below (sent to http://localhost:9200/orders/_bulk):

{"index": {"_index": "orders", "_type": "order"}}
{"month": 1, "title": "Order1"}
{"index": {"_index": "orders", "_type": "order"}}
{"month": 1, "title": "Order2"}
{"index": {"_index": "orders", "_type": "order"}}
{"month": 2, "title": "Order1"}
{"index": {"_index": "orders", "_type": "order"}}
{"month": 1, "title": "Order3"}

Normally this query should provoke an exception because Elasticsearch doesn't know which routes apply to indexed documents:

RoutingMissingException[routing is required for [orders]/[order]/[null]]

To make it works, we should only add "_routing": "january|february" entry after "_type" parameter in index line of bulk query.

Now, let's check what happens if we search "Order#2" by routing query to January route and to February route, and to both of them:

Routing appears as interesting feature for grouping data. At the begin of this article we discovered that correctly defined routing can avoid sending query to shards which certainly don't hold searched document. After that we shortly presented how to implement routing in indexing phase and how to benefit of them in searching step.

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
đź”— past projects


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!