approximation algorithms blog posts on waitingforcode.com

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests

Medallion architecture & Lakeflow SDP

Max 10 participants · production-ready templates

See the full curriculum → €7,000 flat fee · cohort of up to 10

Bartosz
Konieczny

December 10, 2017 • Big Data algorithms

HyperLogLog explained

Counting the number of distinct elements can appear a simple task in classical web service-based applications. After all, we usually have to deal with a small subset of data that simply fits in memory and can be automatically counted with the data structures as sets. But the same task is less obvious in Big Data applications where the approximation algorithms can come to the aid.

Continue Reading →

April 22, 2018 • Big Data algorithms

Frequency estimation with Count-min sketch

HyperLogLog algorithm described some weeks ago is not the single one approximate solution in the world of Big Data applications. Another one is Count-min sketch.

Continue Reading →

June 10, 2018 • Big Data algorithms

Cardinality estimation with linear probabilistic counting

This post follows the series about approximation algorithms. But unlike before, this time we'll focus on simpler solution, the linear probabilistic counting.

Continue Reading →

July 15, 2018 • Big Data algorithms

Bloom filter

After HyperLogLog and Count-min sketch it's time to cover another popular probabilistic algorithm - Bloom filter.

Continue Reading →

July 22, 2018 • Big Data algorithms

Scalable Bloom filter

Bloom filter has a lot of versions addressing its main drawbacks - bounded source and add-only character. One of them is Scalable Bloom filter that fixes the first issue.

Continue Reading →

approximation algorithms articles

What would it take for you to trust your Databricks pipelines in production?

HyperLogLog explained

Frequency estimation with Count-min sketch

Cardinality estimation with linear probabilistic counting

Bloom filter

Scalable Bloom filter