Retrospective: 2022 on

A new year is coming and it's a great moment to summarize what has happened in the blog and around!

Blog in 2022

This year I blogged less. I wrote only 67 blog posts. It's almost 30% less than last year and 50% than 2 years ago:

YearBlog posts

You see the trend, don't you 🥺 ? I love blogging and the number has decreased almost by x2 in 4 years... Is it bad? Not at all. Despite my geeky and introverted nature I also love people and especially my little family and friends. Last year my lovely daughter was born and instead of writing blog posts, we simply spend time together 🙂 These 67 blog posts + fully remote work + family time + sport activity are more a sign of almost reaching a work-life balance. I still have some things to improve, but it's much better than it was!

Among these 67 blog posts I wrote the most (13) for the Data engineering on the cloud category. Just behind there are 12 blog posts about Apache Spark SQL. The 3rd of the top 3 popular categories is ... PySpark! Understanding it was a missing piece in my Apache Spark exploration and I'm happy to finally start this part.

Among other things, I'm also very happy to start writing about table file formats. They all have roughly the same number of blog posts and the journey will continue next year. What could I have done better? I'm a little bit disappointed by the number of blog posts about Apache Spark Structured Streaming (5) but with Project Lightspeed, I should have much more work to do next year!

In a more fine-grained level, the most read blog posts were:

  1. Distinct vs group by key difference - 4762 views, published on 01/01/2022
  2. Task retries in Apache Spark Structured Streaming - 2958 views, published on 08/01/2022
  3. Shuffle configuration demystified - part 1 - 2855 views, published on 12/03/2022
  4. Reverse ETL - 2365 views, published on 09/01/2022
  5. Kubernetes concepts for Apache Spark - 2172 views, published on 08/01/2022

Blog and social media in 2022

In 2022 I also have made a decision to post regular updates on LinkedIn. I transformed my bi-weekly updates to update per blog post and the top 3 most engaging articles are:

  1. Serialization in PySpark - 86 engagements
  2. Shuffle in PySpark - 56 engagements
  3. Data contract - 46 engagements

I have a whole year history on Twitter and the top 3 are:

  1. Data+AI Summit retrospective, 50 engagements (comments + likes + retweets)
  2. Introduction to table file formats, 40 engagements
  3. Task retries in Structured Streaming, 38 engagements

Other projects

Blogging was not my single knowledge sharing activity last year. I also opened Become a Data Engineer classes and released my first ebook Data engineering patterns on the cloud.

Plans for 2023


Become a Data Engineer:

Data engineering patterns on the cloud:


Cloud data engineer:

All the above are just the plans. I know from the past that I might not fully succeed in reaching them. I may discover another exciting thing in 2023, take more time than expected for preparing the certification, or simply speak not at 3 but 6 conferences and have less time to check all the boxes. Anyway, I will share all the defeats and victories with you. Thank you for being with me in 2022 and hope you'll enjoy the next year too!

Wishing you all a happy, healthy and successful 2023!

If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!