Become a better Data Engineer with

Master stream processing

You have a first successful experience with batch data pipelines and were asked to implement your first stream processing jobs?

You shouldn't consider stream processing as a batch on the unbounded data. It's much more than that!

There are various stream-processing concepts. This 3-modules course will show you them from lead you through them, from the data ingestion to the stateful stream processing!

Master Apache Spark Structured Streaming (WIP)

On hold until September, 2024.

The module is under development. Join the waiting list to be notified about the release.

You've joined a news company. The company publishes news on a website and is just in the beginning of their data journey.

So far they've been relying on batch processing to generate insight. They have been using tools like Apache Spark SQL, Apache Airflow, an object store, and a data warehouse.

However, the project requires near real-time processing capabilities in many places. You're the one who will lead this batch-to-streaming transformation!

Your goal is to go through the course and solve each homework exercise with the elements learned so far. By the end of the course your system will become streaming-first and you'll take a few months off followed by a raise, as promised by your Head of Data Engineering 🙂

Master batch processing (WIP)

On hold until September, 2024.

The module is under development. Join the waiting list to be notified about the release.

Is writing batch pipelines more than just defining Extract, Load and Transform steps?

Yes, much more! Batch processing has its own data flow patterns, data exposition patterns, idempotency, backfilling, and data quality aspects.

It may be difficult to get them all right from the day 1. Unless you take a shortcut. This 3-modules course is one of them.

Data engineering patterns on the cloud

How to solve common data engineering problems with cloud services?

A list of comprehensive data processing, data storage, data security, data warehouse, data management, data orchestration and data transfer patterns to apply on the cloud.

Onsite trainings

Besides the self-pace learning experience, I'm providing custom trainings:

Course Duration Maximum students
Data Engineering 101 3 days 10 I need this training!
Apache Spark Structured Streaming, basics and beyond 2 days 10 I need this training!
Stream processing for data engineers 2 days 10 I need this training!
Software engineering best practices for data engineers 2 days 8 I need this training!