My name is Bartosz Konieczny and I am a freelance data engineer. I like applying software engineering best practices to the data pipelines I'm writing in Scala, Python, or Java. I'm specialized in Apache Spark, cloud data engineering, and stream processing.

How can I help you?

It depends - you know already, I'm a consultant ;-) More seriously, it depends on the problem you have! I can help with any data engineeringing issues but I will not be the best person to implement a Machine Learning algorithm or a backend service.

My areas of expertise are:

  • Apache Spark
  • AWS, Azure and GCP data services
  • stream processing
  • Java, Scala, Python, SQL
  • software craftsmanship in the data software context, focused on the clean code

Besides I'm also a follower of the you build it, you run it, hence CI/CD pipelines (Gitlab CI, Jenkins) and IaC (Terraform) are my daily work routines.

I didn't include all technologies I've been working with so far in the list above since I'm not considering myself as an expert in them. But to complete the picture, I've also worked with other data technologies, including Apache Airflow, Apache Beam (Dataflow runner), ETL- and ELT- based batch processing, or yet serverless functions.

What can I bring to your team?

You know my hard skills already. But in addition to them, I'm fully committed to the teams and people I work with. As a long-life learner, I'm always looking to bring the innovation to the code as well as people. If you are here, you know I like to share my discoveries on the waitingforcode.com blog, in the hope of spreading the knowledge throughout the data community. I also do thisit privately with my teammates by leading internal workshop, preparing POCs, and improving the code quality and team skills through code reviews.

What did I do last years?

I don't like stay idle and am continuously looking for new data challenges. Below is the list of the projects I was working on:

Category Problem Tech stack
Data cleansing A stateful streaming job preparing data for the Silver layer, including the modifications like standardization, reformatting, deduplication. Apache Spark Structured Streaming, AWS, Jenkins, Scala
Data dispatching A real-time serverless job classifying data as valid or invalid and dispatching it to dedicated streams. AWS, Scala
Data dispatching A real-time serverless job classifying data as valid or invalid and dispatching it to dedicated streams. Apache Beam, GCP, Java, Cloud Build
Data migration Batch migration from Hadoo Hive to GCP BigQuery. Apache Spark, GCP, Scala
Data preparation Data cleansing, normalization, and enrichment for a predictive ML use case. Apache Airflow, Apache Spark SQL, Azure, Azure DevOps, Python
Data privacy GDPR right-to-be-forgotten system in the data warehouse based on GCP BigQuery. Apache Airflow, GCP, SQL, Python
Data validation Data quality real-time and stateful pipeline to validate the order of the transformed events. Apache Spark Structured Streaming, AWS, Scala
Data visualization Delivering sensors data for PowerBI in near-real time. Azure, Azure DevOps, Python
ELT Various business-related batch and SQL-based only pipelines, including sessions generation, data cleansing, data preparation for data marts used by data analysts. Apache Airflow, GCP, Github Actions, Python, SQL
Reverse ETL Real-time streaming pipeline integrating relevant events into an external CRM tool. Apache Beam, GCP, Java, Cloud Build
Ordered streaming data delivery. Streaming pipeline delivering multi-tenant data with the ordering and maximum latency constraints. Apache Spark Structured Streaming, AWS, Gitlab CI, Scala
Sessionization Hourly batch jobs processing for sessions generation and a follow-up part for the data ingestion to the data warehouse layer. I also gave a talk on that topic. Apache Airflow, AWS, Apache Spark SQL, Scala, Python, Jenkins
Sessionization Real-time users sessions generation with the help of session windows. Apache Beam, GCP, GCP Cloud Build, Java, Terraform
Sessionization. Serverless sessionization pipeline relying on the Change Data Capture for providing users activity insight in near real-time. AWS, Scala, Terraform

Can't find your problem in the list? Well, I haven't had a chance to work on it yet, but I'm excited to help as long as it stays in the data engineering landscape!

We need to chat!

Send me an email with your project at work@waitingforcode.com. No worries if it's not very detailed but please include the problem nature and technological stack.