Articles about Data engineering on GCP on waitingforcode.com - articles for the pleasure of learning and discovery

April 18, 2021 • Data engineering on GCP

GCP Dataflow by an Apache Spark guy

Some months ago I wrote a blog post where I presented BigQuery from a perspective of an Apache Spark user. Today I will do the same exercise but applied to the same category of data processing frameworks. In other words, I will try to understand GCP Dataflow thanks to my Apache Spark knowledge!

Continue Reading →

February 28, 2021 • Data engineering on GCP

GCP BigQuery by an Apache Spark guy

One of the steps in my preparation for the GCP Data Engineer certificate was the work with "Google BigQuery: The Definitive Guide: Data Warehousing, Analytics, and Machine Learning at Scale" book. And to be honest, I didn't expect that knowing Apache Spark will help me so much in understanding the architectural concepts. If you don't believe, I will try to convince you in this blog post.

Continue Reading →

January 31, 2021 • Data engineering on GCP

My journey to GCP Data Engineer

Last December I passed the GCP Data Engineer exam and got my certification as a late Christmas gift! As for AWS Big Data specialty, I would like to share with you some feedback from my preparation process. Spoiler alert: I did it without any online course!

Continue Reading →

December 27, 2020 • Data engineering on GCP

Lakehouse and BigQuery?

You know me already, I'm a big fan of Apache Spark but also of all kinds of patterns. And one of the patterns that nowadays gains in popularity is lakehouse. Most of the time (always?), this pattern is implemented on top of an ACID-compatible file system like Apache Hudi, Apache Iceberg or Delta Lake. But can we do it differently and use another storage, like BigQuery?

Continue Reading →

Data engineering on GCP articles

GCP Dataflow by an Apache Spark guy

GCP BigQuery by an Apache Spark guy

My journey to GCP Data Engineer

Lakehouse and BigQuery?