Before I share the usual retrospective for the past year, I want to thank you for following along in 2025! Even though I'm primarily writing for "me-from-the-future", it's always great to know that people other than my future self find these posts helpful ;)
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I wrote
one on that topic! You can read it online
on the O'Reilly platform,
or get a print copy on Amazon.
I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩
Data engineering design patterns
Last year (cf. retrospective for 2024) I revealed the so-called Secret project I had been working on past months. This year I'm excited to share the Data Engineering Design Patterns book has been finally released!
Technically, the book has even been already upgraded! Thanks to Buf's sponsorship, there is an additional, 11th chapter on streaming data design patterns! This extra chapter is only available in the PDF version, though!
And besides pure facts, let me share some community feedback, including comments on Amazon's page, few blog posts, and community reviews; I might have mentioned that already, but devoting my Miracle Morning routine for writing the book for the past two years was totally worth it! Here are some of most recent screenshots I captured from social media:
Blog in 2025
After spending my 2024 mostly on writing the book, I did a bit better this year for the blog posts, as you can see in the summary table below:
| Year | Blog posts |
|---|---|
| 2024 | 40 |
| 2024 | 28 |
| 2023 | 53 |
| 2022 | 68 |
| 2021 | 93 |
| 2020 | 105 |
| 2019 | 114 |
| 2018 | 139 |
Besides, I'm pretty satisfied with the blog posts distribution. As planned, I could finally write a bit more about Databricks which is translated by 9 blog posts published in 2025. The single deception for me is Apache Flink. I wanted to spend some extra time without putting too much pressure on it but I couldn't. Not having this pressure was probably a mistake leading to 0 Apache Flink blog posts in 2025. The overall distribution looks like in the next table:
| Topic | Blog posts in 2025 | Blog posts in 2024 |
|---|---|---|
| Databricks | 9 | 0 |
| Apache Spark SQL | 7 | 1 |
| Delta Lake | 7 | 4 |
| Apache Spark Structured Streaming | 6 | 10 |
| General data engineering | 6 | 7 |
| SQL | 2 | 0 |
| Data engineering patterns | 2 | 1 |
| PySpark | 1 | 0 |
Plans for 2025
Before I share with you my plans for 2026, let's go back to 2025 and see what I have been expecting from it and what was the realty:
Data Engineering Design Patterns:
- It'll still be my priority in 2025. However, as the content is already there, I'll mainly focus on adding more code snippets and thinking about new patterns. - ✅ the book was released as planned!
Blogging:
- Apache Spark. My main focus should remain Apache Spark. This time I don't plan to focus on the Structured Streaming part exclusively but it should remain the most important area. - ✅ Indeed, you can see this time the distribution was pretty equal between the Apache Spark SQL and Apache Spark Structured Streaming blog posts.
- Delta Lake. So far I have been focusing on streaming aspects mostly. This year I would like to extend the scope to other more general topics. - ✅ I did better on this area too. I went from 4 to 7 blog posts on Delta Lake which is great!
- Unity Catalog. It was one of the disruptive announcements in 2024 but because of the book I didn't find time to start exploring the project. It should change this year, fingers crossed! - ❌ Unfortunately, I couldn't organize myself to cover Unity Catalog.
- Apache Flink w/t constraints. I still have some topics open in 2024 that I would like to know better. However, last year I realized how challenging it was to deep dive into Apache Flink's internals and I don't put any pressure on numbers here. - ❌ As I mentioned, 0 blog posts on Apache Flink is definitively not a result to be proud of.
- Databricks. I was recognized a Databricks MVP for the 3rd time in a row and I feel a need to share my thoughts and findings, mostly related to streaming part or software engineering best practices. - ✅ Pretty good progress on this topic. I've bootstrapped Databricks category on the blog and already have some planned blog posts for next year!
- General data engineering. I have some general data engineering topics in my backlog, mostly related to tests, streaming, and software engineering. I need to write about them as these topics have started to haunting me 😉 - ✅ I can say it was generally OK. The number doesn't differ a lot from 2024.
- 1000 blog posts. With all these various topics to cover it should be easy to write the missing 60-ish blog posts. But I'm not that confident as I also need to keep a balance with my other life activities. So let's see and target the 1000 blog posts by December 2025. Even though I don't reach this number, in 2026 I'll still be closer to it than I am now 🙂 - ❌ I'm still not there but definitively I'm much closer to 1000 blog posts. At this moment I published 985 articles. I should reach the magical number by 2026!
Freelancing:
- Databricks, Apache Spark, cloud computing will remain my main focus areas in 2025. If you have any short- or long-term project you feel I could help, simply drop me an email at work@waitingforcode.com. - ✅ Another exciting Apache Spark and Databricks project behind me and a new one to come in 2026!
- Live training. I'll be updating the offer with new and shorter trainings. - ❌ Finally I had to spend more time on other activities and I couldn't update the plan.
Become a Better Data Engineer:
- This year I want to resume the work on this extra learning initiative. However, I'd like to propose a new format, much shorter and, hopefully, better suited for the busy people you are! That said, the essentials, so code snippets and homework assignments, will remain the same 💪 - ❌It was completely out of my focus in 2025, and I will give up this project.
Data engineering patterns on the cloud:
- I already know that with all things planned and shared so far I won't have enough time to work on the ebook. - ❌Confirmed, I didn't find any free time to update the e-book.
Speaker:
- I never submit the same talk at different conferences. Consequently, each conference means for me weeks of preparation for the content, code, as well as many hours of rehearsal. Even though I like this part of my knowledge sharing activity I know it already, I won't have time this year. - ❌ It must be an achievement not to achieve a not-speaking engagement 🙃 Despite my initial engagement to take a break, by the end of the year I spoke at 3 meetups (online + physical).
Cloud data engineer:
- Databricks. It'll be my certificate priority this year. - ❌ Not yet, but still in my head ;)
- Azure and GCP data engineer. I still need to renew both data engineering certificates for these two clouds. - ❌I will treat Databricks as a priority.
Plans for 2026
What about next year?
Data Engineering Design Patterns:
- I don't know what the new year will bring, maybe something as unexpected as Chapter 11? The only thing I know is that I have a bunch of notes on data engineering design patterns I would like to transform into blog posts.
Blogging:
- Apache Spark. No exception for yet another year ;) Especially now when the release cycle should change, I will have a lot to write and share about!
- Databricks. I particularly enjoyed exploring Databricks features and sharing them with you in 2025. So I'll unlikely continue the journey; not to mention with the 3rd nomination to MVPs comes great responsibility ;)
- Delta Lake. The item is somehow related to the Apache Spark's point. I still love exploring Delta Lake's under-the-hood mechanics; among the first features you'll likely encounter are deletion vectors and liquid clustering. (Whenever someone mentions 'Delta Lake', these two essential topics immediately spring to mind! ^_^).
- Unity Catalog. I will keep it on my list but instead of going through the Open Source version, I'm going to focus on the Databricks implementation.
- Software engineering. Over the years I realized my software engineering background helped me a lot in organizing and leading data engineering projects. In 2026 I have plans to share and learn more on software engineering aspects.
- General data engineering. I might be blogging a bit on data engineering design patterns here. But as of today, I've more software engineering topics (in data engineering context) than pure data engineering ones ;-)
- Cloud data engineering. I left this part aside but I would like to come back, at least with the "What's new on the cloud for data engineers" series. But I must admit, writing those summaries was taking time and I might think of generating them automatically this time with some prompt.
- 1000 blog posts. Only 15 blog posts to reach the magic number! The goal will be to do it by the next Data+AI Summit (June 15-18, 2026).
Freelancing:
- Databricks and Apache Spark projects on the cloud are still my main area of focus. I'm pretty excited about my next freelancing project as I have probably found my unicorn which is a project mixing Apache Flink for streaming and Databricks with PySpark for batch workloads. Meantime, if you have something in mind for now or the future, write me down at contact@waitingforcode.com so that we stay in touch!
- Workshops. My workshop ideas have become more concrete after some preparation time in the last two weeks. Hopefully I'll be able to share something on them around April.
Speaker:
- I don't know yet. A part of myself would like to resume this chapter but the other part prefers to spend the preparation energy on the workshops... I'm leaving the point here to see next year how my speaking no-engagement went. As you already know, it went quite badly in 2025 because I did 3 talks instead of 0 ;)
Cloud data engineer:
- Databricks. I keep it as a priority for the certificate this year.
- Azure and GCP data engineer. As I have been working over past years more closely with AWS and Azure, if I have time, I will certainly devote it to an Azure certification. But this item is more like the first backlog item for the next sprint; not my priority but a nice-to-have win in 2026 🙂
I like this retrospective series. Not because it's close to my birthday but because it shows the progress made over a longer period of time, and because it helps organize the incoming year (insisting on "helps" as everything remains flexible and the single known part in life is change).
Wishing you all a healthy and successful 2026!
Best,
Bartosz
Consulting
With nearly 17 years of experience, including 9 as data engineer, I offer expert consulting to design and optimize scalable data solutions.
As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and
drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!
👉 contact@waitingforcode.com
đź”— past projects

