My ideal data engineer job posting

The "Data is the new Oil" is one of popular sentences describing the huge role of data in our world. And as other resources, data must be extracted too. To find these "Oil workers", organizations look for, among others, data engineers. The task is more or less easier and this difficulty depends on various factors. From my 6-years perspective, one of the key starting elements is the job announcement.

In this blog post I'd like to share with you my observations about data engineering job postings, with a short disclaimer though. It's my my ideal vision, not a universal truth and wisdom. You certainly have yours and to share it, please comment in the section under the article to keep all the article-related elements in a single place. By mixing up our experiences and expectations, we could build something more complete!

How to start?

The job postings usually start with a section presenting the company, eventually preceded by some eye-catching phrases with exclamations or interrogation marks. I totally get the idea but I am not always in line with the implementation. Some of the things to improve I've observed are:

Why these two? My reading flow relies on the bias that I will never know all the companies in the world, whereas I know what happens in the industry. So I'm naturally focusing first on the project itself, including its technical challenge or growth opportunity, and only after I look for more info about the company.

Technical details

After the introduction comes the technical details part. I would start it with a general technical context presentation. Does the project use only batch processing? Or maybe there are streaming parts as well? If so, are they business pipelines (e.g. stateful processing) or rather technical ones (e.g. for data ingestion)? The processing jobs rely on SQL-only, are a mix between SQL/Python or SQL/Scala? This information avoids interviewing people attracted by one specific data processing mode, which might not be the case of the job posting.

The next nice-to-have thing is the tools list. Listing the used frameworks or cloud services is great, and almost the must-have for any serious job announcement. But I like to find a list extended by a short description of the use case solved by the tool. It's valuable information that can also be used as a filter. Let's take an example of AWS Lambda service and 2 different reasonings:

And still, if using AWS Lambda as a runtime for stateful jobs wouldn't be interesting, I could check other frameworks or services and eventually do not consider this Lambda point as a deterministic criterion. The point is, I'd probably finish reading this type of posting before checking the next one.

Also, put some numbers! I must admit, reading things like "we're ingesting 300 000 events/minute" is more appealing than "we're ingesting events from a 3rd party" but it also gives extra information and can avoid unnecessary interviews for the candidates who are/aren't attracted by Big Data volumes. But there is nothing to be ashamed of. It's just information that eventually can seduce a specific group of candidates and save you, as a recruiter, some time by interviewing the right candidates.

Job details

Besides the technical details, the job details remain important in the announcement. Ideally, I'd like to find the details like:

I'm not pretending all offers must include all these details. These are only my observations that I would like to share with you and hopefully, contribute to make our data engineering world better. One day we'll all look for a new job and there is nothing worse than not being able to select the good ones.


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!