Berlin Buzzwords 2023 - notes for data engineers

That's the conference I've heard only recently about. What a huge mistake! Despite the lack of "data" word in the name, it covers many interesting data topics and before I share with you my notes from this year's Data+AI Summit, let me do the same for Berlin Buzzwords!

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I'm currently writing one on that topic and the first chapters are already available in πŸ‘‰ Early Release on the O'Reilly platform

I also help solve your data engineering problems πŸ‘‰ contact@waitingforcode.com πŸ“©

Streaming

A Crash Course in Error Handling for Streaming Data Pipeline by Stefan Sprenger

Besides, Stefan also gives some code details on how to implement the Dead-Lettering and retries with Kafka Streams.

Minimizing the memory footprint of Apache Flink by Robert Metzger

Thank you, Robert! I haven't seen such a technically detailed investigation of the JVM in the data context for years! In summary:

A Kafka Client's Request: There and Back Again by Danica Fine

I haven't expected to see an in-depth talk about Apache Kafka elsewhere other than Kafka Summit. Good lord, how wrong I was! Danica shared a great deep dive into Kafka Client's requests. Put differently, she explained this picture in details:

The talk has a lot of details, I'm summarizing here my discoveries or important reminders:

Data engineering

Apache Airflow in Production - Bad vs Best Practices by Bhavani Ravi

Because it's always to have a handy list of best practices, I couldn't miss Bhavani's talk about them in Apache Airflow!

When Probably is Good Enough by Savannah Norem

Savannah gave a great talk about probabilistic data structures. How good was to refresh my memory and discover the structures I haven't covered in my exploration back in 2018. Some takeaways from the talk:

Hadoop Vectored IO: your data just got faster! by Steve Loughran

Steve Loughran explained a new Hadoop Vectored IO that improves reading from the cloud object stores:

I also had plans to watch the talks about column lineage, Kaldb, ClickHouse, and Data Mesh migration, but finally needed to postpone due to other topics waiting in my head. For sure, I'll watch them one day but in the meantime, I prefer to share the notes for the 6 first watched presentations, hopefully you find them useful!


If you liked it, you should read:

πŸ“š Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!