Locks in Elasticsearch

Concurrency issues in Elasticsearch are often provoked by the lack of ACID transactions support. However, the search engine provides some of locking mechanisms to deal with them.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I'm currently writing one on that topic and the first chapters are already available in 👉 Early Release on the O'Reilly platform

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

This article covers 2 available methods for documents locks management. The first part will introduce the idea of global locking. Next, we'll describe locking at document level. The last part will describe lock mechanism implemented in Elasticsearch Java API. The last available lock mode, tree, can be only adopted to special cases, such as directory trees access. It's the reason why it won't be presented here.

Global locking

If no locking mechanism is defined, the most recently executed document modification always wins. It can lead to some inconscient situations where changes coming from operation executed before another operation are applied. But it can be controlled with a system of locks, and more precisely with global locks.

As the name indicates, this kind of locks are applied globally to all documents. If applied, only one change at given moment will be supported. This lock is acquired thanks to explicit PUT request /fs/lock/global/_create. Lock is released with DELETE request on /fs/lock/global service, ideally after that all changes are applied.

However, this mechanism has some drawbacks. First of all, it's global so even if somebody tries to change another document than the document changed by locking request, he won't be able to do so. It influences performances, especially in environments supposed to support a lot of concurrent changes.

Before presenting another locking mechanism, let's try to see what happens if we modify concurrently two different documents with global lock. To do so, we execute following requests:

http://localhost:9200/books

{"mappings": {"book": {
  "properties" : {"title" : { "type" : "string"}}
}}}

We begin by creating an index.

{"acknowledged":true}

http://localhost:9200/books/book/_bulk

Now, we want to index some documents.

{"index": {"_index": "books", "_type": "book", "_id": "BOOK1"}}
{"title": "Title_1"}
{"index": {"_index": "books", "_type": "book", "_id": "BOOK2"}}
{"title": "Title_2"}

{"took":2,"errors":false,"items":[{"index":{"_index":"books","_type":"book","_id":"BOOK1","_version":1,"status":201}},{"index":{"_index":"books","_type":"book","_id":"BOOK2","_version":1,"status":201}}]}

http://localhost:9200/fs/lock/global/_create
We acquire global lock for all documents.
```
{}
```
```
{"_index":"fs","_type":"lock","_id":"global","_version":1,"created":true}
```
Try to acquire global lock once again. DocumentAlreadyExistException should be generated by Elastcisearch because of already existent lock:
```
{"error":"DocumentAlreadyExistsException[[fs][0] [lock][global]: document already exists]","status":409}
```
http://localhost:9200/books/book/BOOK1/_update
Now we try to edit the first indexed book (Book1) with given request:
```
{"doc": {"title": "new title for Book1"}}
```
```
{"_index":"books","_type":"book","_id":"BOOK1","_version":2}
```

http://localhost:9200/fs/lock/global

With this query we release global lock. Releasing this lock means the document deleting from index called fs. We can see that by calling web service listing all available indices (http://localhost:9200/_cat/indices) before the execution of this delete request:

yellow open fs                 5 1     1 0   2.5kb   2.5kb 
yellow open waitingforcode     5 1  5124 0 757.5kb 757.5kb 
yellow open books              5 1     2 0   5.1kb   5.1kb

Response for lock releasing should be:

{"found":true,"_index":"fs","_type":"lock","_id":"global","_version":2}

And fs index content should be empty:

yellow open fs                 5 1     0 0    575b    575b 
yellow open waitingforcode     5 1  5124 0 757.5kb 757.5kb 
yellow open books              5 1     2 0   5.1kb   5.1kb

Document locking

More fine-grained locking is provided with document locking mechanism. As this name indicates, the lock concerns only modified documents. It works similarly to global locking. The only significant difference consists on definition of process_id attribute on lock acquiring and releasing. It helps to identify to which operations given lock belongs.

Let's see how it works in real example (still using the same index as created in part describing global lock):

http://localhost:9200/fs/lock/_bulk

{"create": { "_id": "BOOK1"}} 
{"process_id": 1} 
{"create": { "_id": "BOOK2"}}
{"process_id": 1}

Two important points must be retained here. Firstly, _id attribute must relate to locked document. Secondly, process_id represents the process acquiring the lock. The same process_id must be used when the lock is releasing. We can see it in the next-to-last step.

{"took":1,"errors":false,"items":[{"create":{"_index":"fs","_type":"lock","_id":"BOOK1","_version":1,"status":201}},{"create":{"_index":"fs","_type":"lock","_id":"BOOK2","_version":1,"status":201}}]}

Try to execute previous request once again. As in the case of global locks, DocumentAlreadyExistsException should be thrown too:

{"took":1,"errors":true,"items":[{"create":{"_index":"fs","_type":"lock","_id":"BOOK1","status":409,"error":"DocumentAlreadyExistsException[[fs][3] [lock][BOOK1]: document already exists]"}},{"create":{"_index":"fs","_type":"lock","_id":"BOOK2","status":409,"error":"DocumentAlreadyExistsException[[fs][4] [lock][BOOK2]: document already exists]"}}]}

Even if we change process_id attribute, DocumentAlreadyExistsException is thrown.

http://localhost:9200/books/book/BOOK1/_update
With acquired document look, let's try to modify the first book:
```
{"doc": {"title": "new title for Book1 from document locking"}}
```
New version should be correctly created:
```
{"_index":"books","_type":"book","_id":"BOOK1","_version":3}
```
http://localhost:9200/fs/_refresh
Thanks to this call we are sure that all operations executed since the last refresh are visible for search executed in the next step.
```
{"_shards":{"total":10,"successful":5,"failed":0}}
```

http://localhost:9200/fs/lock/_query

This search query is destined to release document lock acquired at the begin of the tests.

{
  "query": {
    "term": {
      "process_id": 1
    }
  }
}

{"_indices":{"fs":{"_shards":{"total":5,"successful":5,"failed":0}}}}

This time we should correctly acquire new document lock for another process.

Locks in Java API

Under the hood, when we execute some operations with Elasticsearch Java API, we invoke locking mechanism automatically:

@Override
public GetResult get(Get get) throws EngineException {
  try (ReleasableLock lock = readLock.acquire()) {
    // ... get operations
  }
}

ReleasableLock is a class located in org.elasticsearch.common.util.concurrent package. It's a simple wrapper for the implementations of java.util.concurrent.locks.Lock. In the case of Elasticsearch Java API, used implementation is ReentrantReadWriteLock. We can find that in abstract org.elasticsearch.index.engine.Engine class:

protected final ReentrantReadWriteLock rwl = new ReentrantReadWriteLock();
protected final ReleasableLock readLock = new ReleasableLock(rwl.readLock());
protected final ReleasableLock writeLock = new ReleasableLock(rwl.writeLock());

ReentrantReadWriteLock contains two locks: one for read and one for write operations. Elasticsearch uses this distinction to handle base sensitive operations and not sensitive ones. For this first group we can qualify indices recovery or synced flush. These operations are related more to shard state rather than to documents. The operations related to documents, such as: creation, updating, deleting, are concerned by the second type of lock, read lock.

The difference between read and write lock comes from ReentrantReadWriteLock. It helps to control lock acquiring in given situations:
- when one thread tries to acquire read lock and another thread has already acquired write lock, the thread acquiring read lock will be forced to wait until the write-lock thread ends - read lock can be acquired only when nobody has acquired write lock before.
- the same rule applies to write lock acquiring. In additionally, we check also that there are no thread with read lock already acquired - write lock can be acquired only when nobody has acquired both, write and read, lock before.

This time we could see how to deal with locks in Elasticsearch. Two first parts described locking at global and document levels. We saw that the operating mode was similar. In the last part we discovered lock mechanism implemented in Elasticsearch Java API. We could see that it uses ReentrantReadWriteLock class and its two different locks: one for read operations and another one for write ones.