Gnocchi articles

4-day workshop Β· In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that β€” unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants Β· production-ready templates
See the full curriculum β†’ €7,000 flat fee Β· cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Choosing time-series database for study

In order to learn a new thing, nothing better than try it. However in some cases the choice of the tool to study is not easy. It's especially true in the context of data storage and though also in the context of time-series databases introduced in one of previous posts.

Continue Reading β†’

Gnocchi architecture

Understanding the architecture is the key of working properly with any distributed system. It's why the series of post about Gnocchi starts by exploring its components.

Continue Reading β†’

Carbonara storage format

Even though carbonara is mostly known as an Italian pasta dish, in the context of Gnocchi it means completely different thing. Carbonara is the name of time points storage format in Gnocchi.

Continue Reading β†’

Sacks - data parallelization unit in Gnocchi

To facilitate parallel processing Apache Spark and Apache Kafka have their concept of partitions, Apache Beam works with bundles and Gnocchi deals with sacks. Despite the different naming, the sacks are the same for Gnocchi as the partitions for Spark or Kafka - the unit of work parallelization.

Continue Reading β†’

Reading aggregates in Gnocchi

Gnocchi writes data partitioned by split key. But often such splitted data must be merged back for reading operations. This post focuses on "how" and "when" of this process.

Continue Reading β†’

Resources and metrics in Gnocchi

Data processing in Gnocchi is strongly related to the index information. One of such valuable assets are metrics and resources, covered just below.

Continue Reading β†’