Retrospective: 2021 on

2021 comes to the end and as last year, it's a great moment to summarize what happened and what will happen in 2022!

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I'm currently writing one on that topic and the first chapters are already available in πŸ‘‰ Early Release on the O'Reilly platform

I also help solve your data engineering problems πŸ‘‰ πŸ“©

Blog in 2021

I blogged less (92 blog posts) than in 2020 (105). But I'm happy about that because I spent that time with my newborn daughter! And by the way, the article you're currently reading, is the number 800 on the blog! I hope to write you about a 1000th in 2 years 🤞.

Among these 92 blog posts, surprisingly I wrote the most (26) about Data engineering on the cloud. You will find Apache Spark only at the 2nd (17 for Spark SQL) and 3rd places (16 for Structured Streaming).

The diagram proves that for the first time I reached the goal set in 2020 which was to learn more about Apache Spark and data engineering on the cloud, and start building a Pi-shaped profile. Although I had been having multiple temptations, such as ACID-compatible file formats, Apache Flink for streaming processing, or Open Source data governance projects, this time I succeeded to respect the contract signed with myself last year in December.

For the most read blog posts, the Top 5 is:

  1. PySpark schema inference and 'Can not infer schema for type str' error
  2. Stack operation in Apache Spark SQL
  3. Performance optimization lessons from Spark+AI and Data+AI Summits
  4. GCP BigTable or AWS DynamoDB, yet another comparison
  5. What's new in Apache Spark 3.1 - JDBC (WIP) and DataSource V2 API

I'm quite surprised with the popularity of the first article. It's not the one I'm the most proud of but somehow it referenced pretty good and happens to be the most popular in 2021.

Changes in 2022

To start, a "new" country. After spending my whole adult life in France (almost 15 years), with my wife we decided to return to Poland. We don't know if it's the best choice. All we know is that it's the moment to try and live closer to the family. Our daughter doesn't go to school and although moving out with a small child is hard, it's easier than moving out and changing schools with an older one. Hence, if it was not now, it would probably never happen.

The moving out decision brings another change. I will leave my employee status and become a freelance data engineer. I'm a bit scared of potential administrative work to do as an entrepreneur and especially, the precious time I'll take. To overcome that, I'll probably need to learn tasks delegation and keep my energy for what makes me happy every morning.

I didn't mention freelancing without reason. It'll impact my organization and routine. I can't assume anymore to regularly get paid every month, so I'll have to find another income that at worse, could "pay the bills". That's why in 2022 I'll be working on extending my e-learning portfolio which probably will impact the blogging schedule.

Due to these changes, I'm not sure of being able to release 2 new blog posts each week. Probably sometimes it will be more, sometimes less, and for sure, sometimes I will have no-posts periods. Anyway, I will continue to inform you about the new blog posts on Twitter and newsletter.

Blogging topics for 2022

What about the topics you can expect next year on the blog? This time, I'll put the answer as a list:

As you can see, things will be different for me next year and will impact the blogging activity. Nonetheless, I hope you'll still be able to find interesting things on the blog. Thank you for being with me in 2021 and hopefully, see you next year!

Wishing you all a happy, healthy and successful 2022!

If you liked it, you should read:

πŸ“š Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!