Bloom filter has a lot of versions addressing its main drawbacks - bounded source and add-only character. One of them is Scalable Bloom filter that fixes the first issue.
After HyperLogLog and Count-min sketch it's time to cover another popular probabilistic algorithm - Bloom filter.
This post follows the series about approximation algorithms. But unlike before, this time we'll focus on simpler solution, the linear probabilistic counting.
HyperLogLog algorithm described some weeks ago is not the single one approximate solution in the world of Big Data applications. Another one is Count-min sketch.
Counting the number of distinct elements can appear a simple task in classical web service-based applications. After all, we usually have to deal with a small subset of data that simply fits in memory and can be automatically counted with the data structures as sets. But the same task is less obvious in Big Data applications where the approximation algorithms can come to the aid.