Nobody is perfect and my name is not nobody. Elasticsearch mapping, as mappings of another storage engines, is rarely immutable. Because of that, index changes can provoke service downtime, according to size of reindexed data. But there are a trick to avoid this dead times.
A virtual conference at the intersection of Data and AI. This is not a conference for the hype. Its real users talking about real experiences.
- 40+ speakers with the likes of Hannes from Duck DB, Sol Rashidi, Joe Reis, Sadie St. Lawrence, Ryan Wolf from nvidia, Rebecca from lidl
- 12th September 2024
- Three simultaneous tracks
- Panels, Lighting Talks, Keynotes, Booth crawls, Roundtables and Entertainment.
- Topics include (ingestion, finops for data, data for inference (feature platforms), data for ML observability
- 100% virtual and 100% free
👉 Register here
This article covers the subject of aliases which can be helpful to limit server downtime. The first part explains the idea of alias. It also lists main use cases. The next part shows how to make aliases.
What aliases are ?
To understand the utility of aliases, we need first to understand how indexed documents are stored by Elasticsearch. Indices are immutable, so they never change. If we want to update one document, a new one is created in its place and the old one is marked as deleted. Documents are stored in segments. Each segment can contain documents of any type. It means that the changing of mapping in one type will need the reindexing of all documents. Reindexing is only necessary when fields definition changes or elements influencing indexation (as analyzers) change. If new fields are added, reindexing is not mandatory.
Reindexing is pretty trivial task. We only need to create new index and copy documents from old index, for example with bulk API. Next step is consist on calling this new index in consumer application. But it requires a lot of actions which can take some time. To avoid this kind of problems, aliases were created.
Alias can be though in term of symlink links in Elasticsearch. So alias is a supplementary layer between user and documents, so far presented as (from the bottom level): shard - index - user. This new layer is placed between index and user. In reality when user calls alias, it calls an index. And it's very useful in the case of reindexing because after this process we can simply switch alias to newly created index, without interrupting the service. Use cases of aliases can be resumed to:
- transparent switch: as already told, aliases are placed under indexes, so without any availability issues we can replace out-of-dated indexes by the new ones.
- grouping and view organization: as VIEW in SQL, aliases can be used to present some semantically similar documents through a single query. We can, for example, have indexes for products sold in all 12 months in 2015. Now, we can make an alias grouping these 12 indexes in a single one. It could be called: "sales_2015".
Elasticsearch guide advises itself to use aliases instead of indices because they are cheap and help to avoid some problems, as the one presented with documents reindexing.
Aliases example
To set up an alias, we need to call _alias endpoint of aliased index. So, if we created index called "january", and we want to alias it with the name "month", we should send a PUT request on: /january/_alias/month. Aliases can be managed through _aliases endpoint which body is based on words representing actions to do on edited alias:
{"actions": [ {"remove": {"alias": "alias_1", "index": "index_1"}}, {"add": {"alias": "alias_2", "index": "index_2"}} ]}
In the case of our sample project, we create an alias on index called waitingforcode by sending this request http://localhost:9200/waitingforcode/_alias/french_football. The response should be:
{"acknowledged": true}
To confirm that this acknowledgment was correctly send, we can also send a GET request on http://localhost:9200/waitingforcode/_alias/*. It should list all aliases related to index waitingforcode:
{"waitingforcode": {"aliases": {"french_football":{}} } }
In Java API, there are no differences between aliases and indices. Method used to generate request for given type stored in the index doesn't change. prepareSearch(String..indices) method supports as well index name as alias name:
private SearchRequestBuilder buildSearchRequestForType(String type) { return elasticSearchClient.prepareSearch(ElasticSearchConfig.ALIAS).setTypes(type); }
This short article presents very important concept from the maintainability point of view - aliases. Thanks to them we can refactor our mapping without any downtime. They are also useful in the situations when we need to group logically related indexes together. And we also saw that adding or modifying alias is the question of one simple HTTP request. We also discovered that the use of aliases is transparent for Java API code.