Data engineering on the cloud articles

Home Data engineering on the cloud

December 20, 2021 • Data engineering on the cloud

Testing streaming data systems on the cloud - ideas

That's one of the biggest problems I've faced in my whole career. The development environment! I'm not talking here about creating cloud resources in different subscription but about the environment sharing similar characteristics to the production. In the blog post I'll share with you different strategies to put in place in the context of the cloud and streaming applications.

Continue Reading →

December 12, 2021 • Data engineering on the cloud

Scaling data processing on the cloud

Processing static datasets is easier than dynamic ones that may change in time. Hopefully, cloud services offer various more and less manual features to scale the data processing logic. We'll see some of them in this blog post.

Continue Reading →

December 5, 2021 • Data engineering on the cloud

Hush! It's a secret on the cloud

How to manage secrets is probably one of the first problems you may encounter while deploying some resources from a CI/CD pipeline. The simple answer is: not manage them at all! Let the cloud services do this.

Continue Reading →

November 28, 2021 • Data engineering on the cloud

Data sharing on the cloud

One of the big announcements of the previous Data+AI Summit was Delta Sharing, a protocol to exchange the life data with internal and external users. The question I asked myself at that moment was "Does it exist on the cloud?". Let's see.

Continue Reading →

November 21, 2021 • Data engineering on the cloud

Data orchestration on the cloud

When it comes to executing one isolated job, there are many choices and using a data orchestrator is not always necessary. However, it doesn't apply to the opposite scenario where a data orchestrator not only orchestrates the workload but also provides a monitoring layer. And the question arises, what to do on the cloud?

Continue Reading →

November 14, 2021 • Data engineering on the cloud

Time travel on the cloud

I've first heard about the time travel feature with Delta Lake. But after digging a bit, I've found that it's not a pure Delta Lake concept! In this blog post I will show you what cloud services implement it too.

Continue Reading →

November 7, 2021 • Data engineering on the cloud

Data Loss Prevention on the cloud

When I was writing my previous blog post about losing data on the cloud, I wanted to call it "data loss prevention". It happens that this term is currently reserved for a different problem. The problem that I will cover just below.

Continue Reading →

October 31, 2021 • Data engineering on the cloud

Not losing data on the cloud - strategies

Data is a valuable asset and nobody wants to lose it. Unfortunately, it's possible - even with the cloud services. Hopefully, thanks to their features, we can reduce this risk!

Continue Reading →

October 24, 2021 • Data engineering on the cloud

Less known data services on the cloud

You all certainly heard about EMR, Dabricks, Dataflow, DynamoDB, BigQuery or Cosmos DB. Those are well known data services of AWS, Azure and GCP, but besides them, cloud providers offer some - often lesser known - services to consider in data projects. Let's see some of them in this blog post!

Continue Reading →

August 29, 2021 • Data engineering on the cloud

What's new on the cloud for data engineers - part 4 (05-08.2021)

It's time for the 4th part of the "What's new on the cloud for data engineers" series. This time I will cover the changes between May and August.

Continue Reading →

August 22, 2021 • Data engineering on the cloud

Cloud networking aspects for data engineers

Guess what topic I was afraid of at the beginning of my cloud journey as a data engineer? Networking! VPC, VPN, firewalls, ... I thought I would be able to live without the network lessons from school, but how wrong I was! IMO, as a data engineer, you should know a bit about networking since it's often related to the security part of the architectures you'll design. And in this article, I'll share with you some networking points I would like to know before starting to work on the cloud.

Continue Reading →

August 8, 2021 • Data engineering on the cloud

Costs management on the cloud

The easiest way to learn is by doing but what if it involves leaving your credit card number beforehand? I've never been comfortable with that but there is no other choice to get some hands-on experience on the cloud. Hopefully, it doesn't mean you can't control your expenses. In this article, we'll see how.

Continue Reading →

August 1, 2021 • Data engineering on the cloud

Serverless streaming processing on the cloud: Azure Stream Analytics vs AWS Kinesis Data Analytics

I was writing this blog post while preparing for Azure's DP-200 and DP-201 certification. Why? To make some cleaning in my head and organize what I learned about Azure Stream Analytics and compare it with what I knew about AWS Kinesis Analytics.

Continue Reading →

July 25, 2021 • Data engineering on the cloud

Data architectures on the cloud

I haven't fully understood it yet, why the story of data architectures is the story of Greek letters. With time, they changed the context and had to adapt from an on-premise environment, often sharing the same main services, to the cloud. In this blog post, I will shortly present data architectures and try to fit them to cloud data services on AWS, Azure and GCP. Spoiler alert, there will be more pictures than usual!

Continue Reading →

July 18, 2021 • Data engineering on the cloud

Windows to the clouds

Guess what? My time-consuming learning mode based on reading the documentation paid again! This time on Azure because while reading about Stream Analytics windows I discovered that I missed some of them in the past. And since today is the day of the cloud, I will see if the same types of windows exist on AWS and GCP streaming services. And if no, what are the differences.

Continue Reading →

July 11, 2021 • Data engineering on the cloud

AWS Redshift vs Azure Synapse Analytics

You know me already. I like to compare things to spot some differences and similarities. This time, I will do this exercise for cloud data warehouses, AWS Redshift, and Azure Synapse Analytics.

Continue Reading →

June 20, 2021 • Data engineering on the cloud

Small data processing on the cloud

Believe it or not, but data processing is not only about Big Data. Even though data is one of the most important assets for modern data-driven companies, there is still a need to process small data. And to do that, you will not necessarily use the same tools as for bigger datasets.

Continue Reading →

June 13, 2021 • Data engineering on the cloud

Object stores on the cloud

The next step of my multi-cloud exploration will be object stores. In the article I will try to find similarities between S3, Storage Account and GCS.

Continue Reading →

June 6, 2021 • Data engineering on the cloud

Dead-letter pattern on the cloud

Data is not always as clean as we would like it to be. The statement is even more true for semi-structured formats like JSON, where we feel working with a structure, but unfortunately, it's not enforced. Hence, from time to time, our code can unexpectedly fail. To handle this problem - as for many others - there is a pattern. It's called dead-letter qnd I will describe it below in the context of cloud services.

Continue Reading →

May 23, 2021 • Data engineering on the cloud

Streaming data sources on the cloud

Streaming broker is one of very common entry points for modern data systems. Since they're running on the cloud, and that one of my goals for this year is to acquire a multi-cloud vision, it's a moment to see what AWS, Azure and GCP propose in this field!

Continue Reading →