Even though I was blogging less in the second half of the previous year, the retrospective is still the blog post I'm waiting for each year. Every year I summarize what happened in the past 12 months and share with you my future plans. It's time for the 2024 Edition!
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I'm currently writing
one on that topic and the first chapters are already available in π
Early Release on the O'Reilly platform
I also help solve your data engineering problems π contact@waitingforcode.com π©
Data engineering design patterns
In my previous retrospective I had an item called Secret project. Many of you already know it was a code name for the Data Engineering Design Patterns book I'm currently writing.
My initial goal was to keep both the blog and the book at the normal writing pace which for the blog means 1 post per week, and for the book sticking to the release plan. But as you noticed, it was not possible as I stopped blogging in August. The book, due to the expected delivery outcomes, has became my priority which translated into:
- 345 pages written at the moment; keep in mind, it's just for statistics - I assume as a data guy, you like the numbers ;) - this value may change in the final version after the last reviews
- 128 code snippets in the Github repo; the examples for the not published chapters are sitting and waiting in a private fork before being promoted to the public space
- 69 schemas created to illustrate the patterns
- 68 design patterns detailed in the book
- Countless writing hours in 6 different cities
Hopefully, you understand better now why I couldn't keep the initial pace for blog posts.
Blog in 2024
Despite switching my focus on the book, I published some blog posts. For sure, the number decreased for the 6th year in a row but I hope the blogs delivered some value to you about stream processing with Apache Spark Structured Streaming and friends:
Year | Blog posts |
---|---|
2024 | 28 |
2023 | 53 |
2022 | 68 |
2021 | 93 |
2020 | 105 |
2019 | 114 |
2018 | 139 |
According to my initial plans, I spent most of my efforts on the streaming topics. It explains the domination of Apache Spark Structured Streaming, and the first mentions of Apache Flink in the statistics per category:
Topic | Blog posts in 2024 |
---|---|
Apache Spark Structured Streaming | 10 |
General data engineering | 7 |
Delta Lake | 4 |
Apache Flink | 3 |
Apache Spark | 1 |
Apache Spark SQL | 1 |
Data engineering patterns | 1 |
Data engineering on the cloud | 1 |
Plans for 2024
Before I share with you my plans for 2024, let's go back to 2023 and see what I have been expecting from it and what was the realty:
Blogging:
- Stream processing has still some exciting areas to discover like Apache Flink and streaming databases. It'll probably remain my focus next year with the blog posts about the aforementioned topics, plus of course Apache Spark Structured Streaming. - I'm happy to bootstrap the Apache Flink chapter on the blog. However, I'm not satisfied with my progress as I had to leave some topics aside, including more advanced deep dives.
- Additionally, I would like to explore streaming capabilities on the cloud, but it's less prioritary than learning the internals of Apache Flink. -Indeed, I didn't find enough time to work on streaming aspects on the cloud.
- I still have a dream of reaching the 1000 blog posts published. So far I'm at 921 and I won't be able to reach this magic number next year. But let's dream big and be as closest as possible to this number! - If you do the math, you'll realise I'm still far away from this magical number.
Secret project (aka Data Engineering Design Patterns):
- I can't tell more on that now but I would like to deliver it as planned, so by the end of 2024. - As of this writing I'm doing the last review before the book hits the release stage. But everything went as planned, mainly thanks to a temporary depriorization of the blog.
Become a Data Engineer:
- If I have some time to spare, I'll work on the remaining courses marked as WIP. - Unfortunately, I didn't find time and instead of this course, I'll focus on something else in 2025.
Data engineering patterns on the cloud:
- Here the goal is to add at least the 9 patterns to reach another magic number of 100 patterns. - Again, I couldn't reach this number as I didn't have time to work on the ebook.
Speaker:
- I don't have a fixed numbers here but I would like to speak at one of these 2 dreamy conferences this year. - A 50% success here as I spoke at the Data+AI Summit in June. I was hoping to speak at Devoxx Paris but my CfP didn't pass. Maybe in 2026?
Cloud data engineer:
- Inevitably, I'll need to renew them in 2024. Probably by the end of the year where I should have more time after the rush on the secret project. - So far I've managed to renew my AWS Data Analytics certification. Due to the time constraints, I couldn't prepare for Azure Data Engineer and GCP Data Engineer.
Plans for 2025
What about next year?
Data Engineering Design Patterns:
- It'll still be my priority in 2025. However, as the content is already there, I'll mainly focus on adding more code snippets and thinking about new patterns.
Blogging:
- Apache Spark. Apache Spark will remain the main topic on the blog, but this time I don't plan to spend my time on the Structured Streaming part exclusively. That said, I still want to explore the areas I couldn't explore last year but it will be in addition to the batch-related topics.
- Delta Lake. So far I have been focusing on streaming aspects mostly. This year I would like to extend the scope to other more general topics.
- Unity Catalog. It was one of the disruptive Open Source announcements in 2024 but because of the book I didn't find time to start exploring the project. It should change this year, fingers crossed!
- Apache Flink without pressure. I still have some topics open in 2024 that I would like to know better. However, last year I realized how challenging it was to deep dive into Apache Flink's internals and keep my other activites. For that reason, I don't put any pressure on numbers here.
- Databricks. I'm happy to share I was recognized a Databricks MVP for the 3rd time. I feel a need to share my private notes, thoughts and findings on the blog. They should be mostly related to streaming area and software engineering best practices.
- General data engineering. I have some general data engineering topics in my backlog, mostly related to tests, streaming, and software engineering. I need to write them down as these topics have started to haunting me 😉
- 1000 blog posts. With all these various topics to cover it should be easy to write the missing 60-ish blog posts. But I'm not that confident as I also need to keep a balance with my other life activities. So let's see and target the 1000 blog posts by December 2025. Even though I don't reach this number, in 2026 I'll still be closer to it than I am now 🙂
Freelancing:
- Databricks, Apache Spark, cloud computing will remain my main focus areas in 2025. If you have any short- or long-term project you feel I could help, simply drop me an email at contact@waitingforcode.com.
- Live training. I'll be updating the offer with new and shorter trainings. I'll post an update soon.
Become a Better Data Engineer:
- This year I want to resume the work on this extra learning initiative. However, I'd like to propose a new format, much shorter and, hopefully, better suited for the busy people you are! That said, the essentials, so code snippets and homework assignments, will remain the same 💪
Data engineering patterns on the cloud:
- I already know that with all things planned and shared so far I won't have enough time to work on the ebook.
Speaker:
- I never submit the same talk at different conferences. Consequently, each conference means for me weeks of preparation for the content, code, as well as many hours of rehearsal. Even though I like this part, I know it already, I won't have time this year.
Cloud data engineer:
- Databricks. It'll be my certificate priority this year. I don't know which one to chose, and why not both?
- Azure and GCP data engineer. I still need to renew both data engineering certificates for these two clouds. I'll do my best but the Databricks certificate is more important at this moment.
The list may look as not challenging at all; just a few blog posts to write. But believe me, it's not the case. I have another list with more private topics and when I mix them, my waitingforcode plans for 2025 are ambitious, but exciting at the same time. See you there in 12 months to see how well I did. Hopefully meantime I'll be able to write some interesting posts for you :)
Wishing you all a happy, healthy and successful 2025!
Best,
Bartosz