Aliases in Elasticsearch on waitingforcode.com

Nobody is perfect and my name is not nobody. Elasticsearch mapping, as mappings of another storage engines, is rarely immutable. Because of that, index changes can provoke service downtime, according to size of reindexed data. But there are a trick to avoid this dead times.

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests

Medallion architecture & Lakeflow SDP

Max 10 participants · production-ready templates

See the full curriculum → €7,000 flat fee · cohort of up to 10

Bartosz
Konieczny

This article covers the subject of aliases which can be helpful to limit server downtime. The first part explains the idea of alias. It also lists main use cases. The next part shows how to make aliases.

What aliases are ?

To understand the utility of aliases, we need first to understand how indexed documents are stored by Elasticsearch. Indices are immutable, so they never change. If we want to update one document, a new one is created in its place and the old one is marked as deleted. Documents are stored in segments. Each segment can contain documents of any type. It means that the changing of mapping in one type will need the reindexing of all documents. Reindexing is only necessary when fields definition changes or elements influencing indexation (as analyzers) change. If new fields are added, reindexing is not mandatory.

Reindexing is pretty trivial task. We only need to create new index and copy documents from old index, for example with bulk API. Next step is consist on calling this new index in consumer application. But it requires a lot of actions which can take some time. To avoid this kind of problems, aliases were created.

Alias can be though in term of symlink links in Elasticsearch. So alias is a supplementary layer between user and documents, so far presented as (from the bottom level): shard - index - user. This new layer is placed between index and user. In reality when user calls alias, it calls an index. And it's very useful in the case of reindexing because after this process we can simply switch alias to newly created index, without interrupting the service. Use cases of aliases can be resumed to:

transparent switch: as already told, aliases are placed under indexes, so without any availability issues we can replace out-of-dated indexes by the new ones.
grouping and view organization: as VIEW in SQL, aliases can be used to present some semantically similar documents through a single query. We can, for example, have indexes for products sold in all 12 months in 2015. Now, we can make an alias grouping these 12 indexes in a single one. It could be called: "sales_2015".

Elasticsearch guide advises itself to use aliases instead of indices because they are cheap and help to avoid some problems, as the one presented with documents reindexing.

Aliases example

To set up an alias, we need to call _alias endpoint of aliased index. So, if we created index called "january", and we want to alias it with the name "month", we should send a PUT request on: /january/_alias/month. Aliases can be managed through _aliases endpoint which body is based on words representing actions to do on edited alias:

{"actions": [
  {"remove": {"alias": "alias_1", "index": "index_1"}}, 
  {"add": {"alias": "alias_2", "index": "index_2"}}
]}

In the case of our sample project, we create an alias on index called waitingforcode by sending this request http://localhost:9200/waitingforcode/_alias/french_football. The response should be:

{"acknowledged": true}

To confirm that this acknowledgment was correctly send, we can also send a GET request on http://localhost:9200/waitingforcode/_alias/*. It should list all aliases related to index waitingforcode:

{"waitingforcode":
  {"aliases":
    {"french_football":{}}
  }
}

In Java API, there are no differences between aliases and indices. Method used to generate request for given type stored in the index doesn't change. prepareSearch(String..indices) method supports as well index name as alias name:

private SearchRequestBuilder buildSearchRequestForType(String type) {
  return elasticSearchClient.prepareSearch(ElasticSearchConfig.ALIAS).setTypes(type);
}

This short article presents very important concept from the maintainability point of view - aliases. Thanks to them we can refactor our mapping without any downtime. They are also useful in the situations when we need to group logically related indexes together. And we also saw that adding or modifying alias is the question of one simple HTTP request. We also discovered that the use of aliases is transparent for Java API code.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems contact@waitingforcode.com 📩

Aliases in Elasticsearch

What would it take for you to trust your Databricks pipelines in production?

What aliases are ?

Aliases example

Data Engineering Design Patterns

Related blog posts: