My ideal data engineer job posting

The "Data is the new Oil" is one of popular sentences describing the huge role of data in our world. And as other resources, data must be extracted too. To find these "Oil workers", organizations look for, among others, data engineers. The task is more or less easier and this difficulty depends on various factors. From my 6-years perspective, one of the key starting elements is the job announcement.

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests

Medallion architecture & Lakeflow SDP

Max 10 participants · production-ready templates

See the full curriculum → €7,000 flat fee · cohort of up to 10

Bartosz
Konieczny

In this blog post I'd like to share with you my observations about data engineering job postings, with a short disclaimer though. It's my my ideal vision, not a universal truth and wisdom. You certainly have yours and to share it, please comment in the section under the article to keep all the article-related elements in a single place. By mixing up our experiences and expectations, we could build something more complete!

How to start?

The job postings usually start with a section presenting the company, eventually preceded by some eye-catching phrases with exclamations or interrogation marks. I totally get the idea but I am not always in line with the implementation. Some of the things to improve I've observed are:

Generalities, such as "Have you ever dreamt of working in an early stage start-up?", "You love challenges?"... These are only 2 examples of too broad eye-catching phrases. An "early stage start-up"? Sure, but what if an early stage means different things for me and you? "Challenges"? I'm looking forward to it but what if I've overcame these challenges 3 years ago? The eye-catching sentences are not bad but I like the details and would like to learn what the challenges are or what is the age of the start-up.
It's mainly about us. Each serious candidate interested in a job announcement will always search for extra info about the company. Therefore from my perspective, a short company introduction is better than a long block of text. The introduction could include things like the company profile (product, consulting), activity, and locality, so that the candidate could filter out the announcement if it's not from the prefered sector or company type. The rest can be freely placed at the end of the announcement.

Why these two? My reading flow relies on the bias that I will never know all the companies in the world, whereas I know what happens in the industry. So I'm naturally focusing first on the project itself, including its technical challenge or growth opportunity, and only after I look for more info about the company.

Technical details

After the introduction comes the technical details part. I would start it with a general technical context presentation. Does the project use only batch processing? Or maybe there are streaming parts as well? If so, are they business pipelines (e.g. stateful processing) or rather technical ones (e.g. for data ingestion)? The processing jobs rely on SQL-only, are a mix between SQL/Python or SQL/Scala? This information avoids interviewing people attracted by one specific data processing mode, which might not be the case of the job posting.

The next nice-to-have thing is the tools list. Listing the used frameworks or cloud services is great, and almost the must-have for any serious job announcement. But I like to find a list extended by a short description of the use case solved by the tool. It's valuable information that can also be used as a filter. Let's take an example of AWS Lambda service and 2 different reasonings:

A not detailed job announcement: We're using AWS Lambda functions.
Me: You're using AWS Lambda functions? Great, but for what? I have to write to the recruiter to know more about that... I'll keep checking other postings and eventually return here later.
A detailed job announcement: We're using Lambda functions as a stateful jobs runtime environment.
Me: Awesome! That's something interesting. Let's keep reading!

And still, if using AWS Lambda as a runtime for stateful jobs wouldn't be interesting, I could check other frameworks or services and eventually do not consider this Lambda point as a deterministic criterion. The point is, I'd probably finish reading this type of posting before checking the next one.

Also, put some numbers! I must admit, reading things like "we're ingesting 300 000 events/minute" is more appealing than "we're ingesting events from a 3rd party" but it also gives extra information and can avoid unnecessary interviews for the candidates who are/aren't attracted by Big Data volumes. But there is nothing to be ashamed of. It's just information that eventually can seduce a specific group of candidates and save you, as a recruiter, some time by interviewing the right candidates.

Job details

Besides the technical details, the job details remain important in the announcement. Ideally, I'd like to find the details like:

Salary. No need to be precise, a range will be enough. It's probably the most controversial point but some of big tech companies like Microsoft already announced regularly posting the pay range. Once again, there is nothing to be ashamed of. It's information that can save time for the recruiter and candidate.
Expectations. I'd add, related to the technical details. No need to put the whole stack of your company IMO. It may confuse the candidate.
Work model. Is the job open in full remote? Or maybe it's hybrid or in-place only? It's almost the necessary information after the post-pandemic work modes revolution (I always hope to see hybrid or fully remote here ;-) ).
Working hours and flexibility. Is it strictly a 9-5 job or does it allow some flexibility? Although it might not be the rule, this info combined with the work model can give some hints about the organizational culture ("Big Brother is watching you" vs "Getting job done").
Recruitment process. What are the next steps and how long does it take? Nobody wants to spend 2 months in a recruitment process. Some people do prefer a more intensive approach, like hourly interviews every day while for the others, a less tight schedule will be better. I'd also include the interview type. If it's a leetcode-like interview, a homework assignment, whiteboard design, ... all this also gives an extra filtering information and helps prepare better or skip the posting.
Team and responsibility. Who is the member of the team? Sharing a LinkedIn page, Github account, or StackOverflow profile can be helpful decision inputs. They show what the candidate can bring to the team but also what skills he/she is going to improve. Also, it's good to know what will be the candidate responsibilities within the team (mentoring, only code delivery, ...).
Perks. What are the extra benefits? Although it shouldn't be the crucial criteria, it's good to know if you can count on extra meal vouchers or a dedicated learning budget.

I'm not pretending all offers must include all these details. These are only my observations that I would like to share with you and hopefully, contribute to make our data engineering world better. One day we'll all look for a new job and there is nothing worse than not being able to select the good ones.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems contact@waitingforcode.com 📩