Big Data algorithms articles

on waitingforcode.com

Reservoir sampling for bounded and unbounded data

Every time when I see a new thing, I try to note it somewhere and come back later. The "later" is driven by how many times I will meet that thing. More often I see it in the books or conferences, earlier I deep delve into it. And that's the story of this post about reservoir sampling algorithm I met twice during last month. Continue Reading →

Conflict-Free Replicated Data Type

Pessimistic replication requires a synchronous communication between the main node writing the data and the replicas. However in some cases the optimistic replication can be more efficient and still guarantee the same final result. One of solutions from this category are conflict-free replicated data types. Continue Reading →

HyperLogLog explained

Counting the number of distinct elements can appear a simple task in classical web service-based applications. After all, we usually have to deal with a small subset of data that simply fits in memory and can be automatically counted with the data structures as sets. But the same task is less obvious in Big Data applications where the approximation algorithms can come to the aid. Continue Reading →