General Big Data articles

4-day workshop Β· In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that β€” unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants Β· production-ready templates
See the full curriculum β†’ €7,000 flat fee Β· cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Big Data tools

Previously we discovered two popular architectures in Big Data systems - lambda and kappa. Because it was new and pretty long concepts to explain, we expressly ignored tools.

Continue Reading β†’

Big Data architectures

When some years ago I done a small POC Hadoop/MapReduce project based on a million song dataset (my old blog in French), I expressly omitted the part about architecture. It was a mistake because correctly designed architecture is as important as code written behind.

Continue Reading β†’

Big Data glossary

Originally, Big Data can seem to be strictly related to one tool - Hadoop. However, it's a misunderstanding of the concept because it hides more interesting stuff.

Continue Reading β†’