Conflict resolution in distributed applications - vector clocks

on waitingforcode.com

Conflict resolution in distributed applications - vector clocks

Dynamo paper, already quoted here in other posts, was published in 2007. It's 10 years ago. Even though the time passed, it still proposes interesting concepts to know for data-driven applications. And one of them are vector clocks used to conflict resolution.

In this post we'll learn about vector clocks. The first part, as usually, introduces the main concepts of this algorithm. The second part makes some critic of it. Finally, the last section gives its sample implementation in Scala.

Vector clocks explained

Before we go to the vector clocks, it's important to stop and focus on conflicts that these clocks can resolve. The table below shows one of possible scenarios when, because of some network failure, the updated value wasn't be replicated on Node2 and the user's change at 9:20 was made on out-of-dated version of data:

Time Written data Node1 value Node2 value
9:00 A A A
9:10 B AB A
9:20 C AB AC

Preventing conflict in such configuration is difficult since there is not central node coordinating writes. As you can see, now the database stores 2 versions of our immutable data initially defined as "A". Hopefully there are some mechanisms helping to deal with these problems and one of them are vector clocks.

Vector clocks algorithm is based on vector of tuples composed of (Node, EventsNumber) for all entries, stored on each of nodes involved in given distributed operation. Now, when some node receives new value for given entry, it updates the EventsNumber in the tuple and propagates it to the replicas. The replicas check if they're in conflict with just written value by comparing the elements between tuples, as resumed in the table below:

Local vector Sent vector Comment
[(N1, 1), (N2, 1), (N3, 3)] [(N1, 1), (N2, 0), (N3, 3)] No conflict - values in local vector are greater than or equal to the sent vector. It means that sent vector precedes the local one.
[(N1, 1), (N2, 0), (N3, 3)] [(N1, 1), (N2, 1), (N3, 3)] No conflict - values in local vector are less than or equal to the sent vector. It means that local vector precedes the sent one.
[(N1, 1), (N2, 0), (N3, 2)] [(N1, 1), (N2, 1), (N3, 1)] Conflict - the events are concurrent because we can't determine the precedence rule.

Now, let's see what happens in the clocks that are not in conflict state and how they can be merged:

Local vector Sent vector New vector
[(N1, 1), (N2, 1), (N3, 3)] [(N1, 1), (N2, 0), (N3, 3)] [(N1, 1), (N2, 1), (N3, 3)]
[(N1, 1), (N2, 0), (N3, 3)] [(N1, 1), (N2, 1), (N3, 3)] [(N1, 1), (N2, 1), (N3, 3)]

As you can see, during the merge of vectors, the largest values are taken. The whole process is illustrated more clearly in the last section vector clock example.

Until now we saw that vector clocks can easily detect conflicts. But what happens after detecting one ? The first solution delegates the conflict resolution to the client. The client application receives all conflicted values and it decides how to deal with them (merge, discard, etc.). Another approach is server-based. One of its implementations can be last-write-wins strategy where the most recent value is returned to the client. It's only a short introduction to available solutions that will probably be detailed more in further posts.

Vector clock critics

However, the vector clocks are not a silver bullet and also have the drawbacks. The first of them quoted very often is the fact that they are only able to detect conflicts and not to resolve them. It's the reason why the options like last-write-win are always proposed in addition to vector clocks or as their replacement.

The second potential problem comes from the number of nodes involved in version vector. More they are, more place the vector will take and more time it will take to compute. Of course it depends on replication level and it's not visible at first glance when we work with only 1 row. But when the number of rows is counted in hundreds of millions, the cost of version vector storage can be not neglected.

Vector clock example

The implementation of detection of conflicts through vector clock algorithms is pretty straightforward, as shown in the snippet below:

class VectorClockTest extends FunSuite with BeforeAndAfter with Matchers {

  test("should detect vector clock as concurrent when some values are greater and some less in 2 nodes") {
    val localNode = new NodeVectorClock(Map("N1" -> 3, "N2" -> 1, "N3" -> 6))

    val isConcurrent = localNode.checkIfConcurrent(Map("N1" -> 2, "N2" -> 3, "N3" -> 2))

    assert(isConcurrent)
  }

  test("should detect vector clock as not concurrent when all values of local node are less or equal than the remote node values") {
    val localNode = new NodeVectorClock(Map("N1" -> 3, "N2" -> 1, "N3" -> 1))

    val isConcurrent = localNode.checkIfConcurrent(Map("N1" -> 3, "N2" -> 3, "N3" -> 2))

    assert(!isConcurrent)
  }


  test("should detect vector clock as not concurrent when all values of local node are greater or equal than the remote node values") {
    val localNode = new NodeVectorClock(Map("N1" -> 3, "N2" -> 3, "N3" -> 6))

    val isConcurrent = localNode.checkIfConcurrent(Map("N1" -> 2, "N2" -> 3, "N3" -> 2))

    assert(!isConcurrent)
  }

}

class NodeVectorClock(localNodeVersionVector: Map[String, Int]) {

  private val ValueForMissingRemoteKey = Int.MinValue
  
  def checkIfConcurrent(remoteNodeVersionVector: Map[String, Int]): Boolean = {
    var localIsGreater = false
    var localIsLess = false
    localNodeVersionVector.keys.foreach(localVectorKey => {
      val localCounter = localNodeVersionVector(localVectorKey)
      val remoteCounter = remoteNodeVersionVector.getOrElse(localVectorKey, ValueForMissingRemoteKey)
      localIsGreater = localIsGreater || localCounter > remoteCounter
      localIsLess = localIsLess || localCounter < remoteCounter
    })
    localIsGreater && localIsLess
  }

}

In this post we've discovered vector clocks that can be used to detect conflicting writes in distributed environments. The first section explained how the conflict can look like and also the basics of vector clocks. The second part listed 2 drawbacks of this solution. Finally the last section shown a simple vector clock implementation in Scala.

Read also about Conflict resolution in distributed applications - vector clocks here: Assigning Lamport & Vector Timestamps , Why Cassandra doesn't need vector clocks , , Last-write-wins .