Hi,
My name is Bartosz Konieczny and I am a freelance data engineer. I like applying software engineering best practices to the data pipelines I'm writing in Scala, Python, or Java. I'm specialized in Apache Spark, cloud data engineering, and stream processing.
It depends - you know already, I'm a consultant ;-) More seriously, it depends on the problem you have! I can help with any data engineeringing issues but I will not be the best person to implement a Machine Learning algorithm or a backend service.
My areas of expertise are:
Besides I'm also a follower of the you build it, you run it, hence CI/CD pipelines (Gitlab CI, Jenkins) and IaC (Terraform) are my daily work routines.
I didn't include all technologies I've been working with so far in the list above since I'm not considering myself as an expert in them. But to complete the picture, I've also worked with other data technologies, including Apache Airflow, Apache Beam (Dataflow runner), ETL- and ELT- based batch processing, or yet serverless functions.
You know my hard skills already. But in addition to them, I'm fully committed to the teams and people I work with. As a long-life learner, I'm always looking to bring the innovation to the code as well as people. If you are here, you know I like to share my discoveries on the waitingforcode.com blog, in the hope of spreading the knowledge throughout the data community. I also do thisit privately with my teammates by leading internal workshop, preparing POCs, and improving the code quality and team skills through code reviews.
I don't like stay idle and am continuously looking for new data challenges. Below is the list of the projects I was working on:
Category | Problem | Tech stack |
---|---|---|
Data cleansing | A stateful streaming job preparing data for the Silver layer, including the modifications like standardization, reformatting, deduplication. | Apache Spark Structured Streaming, AWS, Jenkins, Scala |
Data dispatching | A real-time serverless job classifying data as valid or invalid and dispatching it to dedicated streams. | AWS, Scala |
Data dispatching | A real-time serverless job classifying data as valid or invalid and dispatching it to dedicated streams. | Apache Beam, GCP, Java, Cloud Build |
Data migration | Batch migration from Hadoo Hive to GCP BigQuery. | Apache Spark, GCP, Scala |
Data preparation | Data cleansing, normalization, and enrichment for a predictive ML use case. | Apache Airflow, Apache Spark SQL, Azure, Azure DevOps, Python |
Data privacy | GDPR right-to-be-forgotten system in the data warehouse based on GCP BigQuery. | Apache Airflow, GCP, SQL, Python |
Data visualization | Delivering sensors data for PowerBI in near-real time. | Azure, Azure DevOps, Python |
ELT | Various business-related batch and SQL-based only pipelines, including sessions generation, data cleansing, data preparation for data marts used by data analysts. | Apache Airflow, GCP, Github Actions, Python, SQL |
Reverse ETL | Real-time streaming pipeline integrating relevant events into an external CRM tool. | Apache Beam, GCP, Java, Cloud Build |
Ordered streaming data delivery. | Streaming pipeline delivering multi-tenant data with the ordering and maximum latency constraints. | Apache Spark Structured Streaming, AWS, Gitlab CI, Scala |
Sessionization | Hourly batch jobs processing for sessions generation and a follow-up part for the data ingestion to the data warehouse layer. I also gave a talk on that topic. | Apache Airflow, AWS, Apache Spark SQL, Scala, Python, Jenkins |
Sessionization | Real-time users sessions generation with the help of session windows. | Apache Beam, GCP, GCP Cloud Build, Java, Terraform |
Sessionization. | Serverless sessionization pipeline relying on the Change Data Capture for providing users activity insight in near real-time. | AWS, Scala, Terraform |
Can't find your problem in the list? Well, I haven't had a chance to work on it yet, but I'm excited to help as long as it stays in the data engineering landscape!
Send me an email with your project at work@waitingforcode.com. No worries if it's not very detailed but please include the problem nature and technological stack.