Let's work together - waitingforcode.com

Who am I?

I'm a freelance data engineer and the author of the Data Engineering Design Patterns book. My passion for knowledge sharing has led me to speak at various conferences such as Data+AI Summit, Spark Summit, NDC, and Big Data Summit, where I discuss topics ranging from data engineering to cloud-based solutions.

Recognized as a Databricks MVP for my contributions to technology sharing, I have built a reputation for my expertise in designing and implementing high-performance data systems on public clouds using Databricks and open-source technologies. As a freelancer, I specialize in crafting scalable and efficient data solutions that empower organizations to harness the full potential of their data.

In addition to my speaking engagements, I actively contribute to the tech community by sharing my knowledge through blogs, open-source projects, and hands-on workshops. Whether you're looking to optimize your data infrastructure or leverage the latest advancements in data engineering, I’m here to help you achieve your goals.

How can I help you?

It depends - you know already, I'm a consultant ;-) More seriously, it depends on the problem you have! I can help with any data engineeringing issues but I will not be the best person to implement a Machine Learning algorithm or a backend service.

My areas of expertise are:

Databricks
Apache Spark
AWS, Azure and GCP data services
batch processing
stream processing
Java, Scala, Python, SQL
software craftsmanship in the data software context, focused on the clean code

Besides I'm also a follower of the you build it, you run it, hence CI/CD pipelines (Gitlab CI, Jenkins) and IaC (Terraform) are my daily work routines.

I didn't include all technologies I've been working with so far in the list above since I'm not considering myself as an expert in them. But to complete the picture, I've also worked with other data technologies, including Apache Airflow, Apache Beam (Dataflow runner), ETL- and ELT- based batch processing, or yet serverless functions for event-driven workflows.

What can I bring to your team?

You know my hard skills already. But in addition to them, I'm fully committed to the teams and people I work with. As a long-life learner, I'm always looking to bring the innovation to the code as well as people. If you are here, you know I like to share my discoveries on the waitingforcode.com blog, in the hope of spreading the knowledge throughout the data community. I also do thisit privately with my teammates by leading internal workshop, preparing POCs, and improving the code quality and team skills through code reviews.

What did I do last years?

I don't like stay idle and am continuously looking for new data challenges. Below is the list of the projects I was working on:

Category	Problem	Tech stack
Data cleansing	A stateful streaming job preparing data for the Silver layer, including the modifications like standardization, reformatting, deduplication.	Apache Spark Structured Streaming, AWS, Jenkins, Scala
Data dispatching	A real-time serverless job classifying data as valid or invalid and dispatching it to dedicated streams.	AWS, Scala
Data dispatching	A real-time serverless job classifying data as valid or invalid and dispatching it to dedicated streams.	Apache Beam, GCP, Java, Cloud Build
Data migration	Batch migration from Hadoo Hive to GCP BigQuery.	Apache Spark, GCP, Scala
Data migration	Migrating PowerBI Dataflows to Databricks with PySpark.	Azure, PowerBI, Databricks, Python
Data preparation	Data cleansing, normalization, and enrichment for a predictive ML use case.	Apache Airflow, Apache Spark SQL, Azure, Azure DevOps, Python
Data privacy	GDPR right-to-be-forgotten system in the data warehouse based on GCP BigQuery.	Apache Airflow, GCP, SQL, Python
Data validation	Data quality real-time and stateful pipeline to validate the order of the transformed events.	Apache Spark Structured Streaming, AWS, Scala
Data visualization	Delivering sensors data for PowerBI in near-real time.	Azure, Azure DevOps, Python
ELT	Various business-related batch and SQL-based only pipelines, including sessions generation, data cleansing, data preparation for data marts used by data analysts.	Apache Airflow, GCP, Github Actions, Python, SQL
Reverse ETL	Real-time streaming pipeline integrating relevant events into an external CRM tool.	Apache Beam, GCP, Java, Cloud Build
Ordered streaming data delivery.	Streaming pipeline delivering multi-tenant data with the ordering and maximum latency constraints.	Apache Spark Structured Streaming, AWS, Gitlab CI, Scala
Scaling automation	Automating a project deployment for new regions. Reduced deployment time from 2 weeks to 1 day.	AWS, Databricks, Pyton
Sessionization	Hourly batch jobs processing for sessions generation and a follow-up part for the data ingestion to the data warehouse layer. I also gave a talk on that topic.	Apache Airflow, AWS, Apache Spark SQL, Scala, Python, Jenkins
Sessionization	Real-time users sessions generation with the help of session windows.	Apache Beam, GCP, GCP Cloud Build, Java, Terraform
Sessionization.	Serverless sessionization pipeline relying on the Change Data Capture for providing users activity insight in near real-time.	AWS, Scala, Terraform

Can't find your problem in the list? Well, I haven't had a chance to work on it yet, but I'm excited to help as long as it stays in the data engineering landscape!

We need to chat!

Send me an email with your project at contact@waitingforcode.com. No worries if it's not very detailed but please include the problem nature and technological stack.