Articles about Data engineering on the cloud on waitingforcode.com - articles for the pleasure of learning and discovery

February 14, 2024 • Data engineering on the cloud

What's new on the cloud for data engineers - part 12 (10.2023-02.2024)

It's time for another part of "What's new on the cloud for data engineers". Let's see what happened in the last 5 months.

Continue Reading →

December 6, 2023 • Data engineering on the cloud

Vertical autoscaling for data processing on the cloud

The "vertical scaling" has caught my attention a few times already when I have been reading about cloud updates. I've always considered horizontal scaling as the single true scaling policy for elastic data processing pipelines. Have I been wrong?

Continue Reading →

September 20, 2023 • Data engineering on the cloud

What's new on the cloud for data engineers - part 11 (06-09.2023)

It's time for another part of "What's new on the cloud for data engineers". Let's see what happened in the last 4 months.

Continue Reading →

June 10, 2023 • Data engineering on the cloud

What's new on the cloud for data engineers - part 10 (03-05.2023)

It's time for another part of "What's new on the cloud for data engineers". Let's see what happened in the last 3 months.

Continue Reading →

March 21, 2023 • Data engineering on the cloud

What's new on the cloud for data engineers - part 9 (01-03.2023)

Have you missed any cloud data engineering-related news in the last 3 months? No worries, I got you covered with the new part of the "What's new on the cloud for data engineers..." series.

Continue Reading →

March 9, 2023 • Data engineering on the cloud

Table file formats are on the cloud

There is always a gap between a disruption in the data engineering industry and its integration on the cloud. It was not different for table file formats which have started gaining interest on AWS, Azure, GCP recently.

Continue Reading →

December 28, 2022 • Data engineering on the cloud

What's new on the cloud for data engineers - part 8 (09-12.2022)

It's the last update on the data engineering news on the cloud this year. There are a lot of things coming out. Especially for the streaming processing!

Continue Reading →

December 10, 2022 • Data engineering on the cloud

Cloud authentication and data processing jobs

Setting a data processing layer up has several phases. You need to write the job, define the infrastructure, CI/CD pipeline, integrate with the data orchestration layer, ... and finally, ensure the job can access the relevant datasets. The most basic authentication mechanism uses login/password pair but can we do better on the cloud? Let's see!

Continue Reading →

September 4, 2022 • Data engineering on the cloud

What's new on the cloud for data engineers - part 7 (05-08.2022)

Four months in cloud history is a huge period of time. Even when 2 of the 4 months are the usual "holiday" months. As you can guess from the title, it's time to see what changed recently on the cloud from a data engineering perspective!

Continue Reading →

May 1, 2022 • Data engineering on the cloud

What's new on the cloud for data engineers - part 6 (01-04.2022)

It's time for the first cloud news blog post this year. The update summary lists all changes of data or data-related services between January 1 and April 25.

Continue Reading →

April 24, 2022 • Data engineering on the cloud

HTTP-based data ingestion to streaming brokers

Data ingestion is the starting point for all data systems. It can work in batch or streaming mode. I've recently covered the batch ingestion pretty much already with previous blog posts but I haven't done anything for the streaming, yet. Until today when you can read a few words about HTTP-based data ingestion to cloud streaming brokers.

Continue Reading →

April 17, 2022 • Data engineering on the cloud

Data migration on the cloud

Data is a live being. It's getting queried, written, overwritten, backfilled and ... migrated. Since the last point is the least obvious from the list, I've recently spent some time trying to understand it better in the context of the cloud.

Continue Reading →

April 10, 2022 • Data engineering on the cloud

Database management services on the cloud

Data migration is one of the scenarios you can face as a data engineer. It's not always an easy task but managed cloud services can help you to put in place the pipeline and solve many common problems.

Continue Reading →

March 20, 2022 • Data engineering on the cloud

Data ingestion to the cloud object store

The volume of the data to migrate from an on-premise to a cloud environment will probably be less significant than previous years since a lot of organizations are already on the cloud. However, it's interesting to see different methods to bring the data there and that's something I'll show you in this blog post.

Continue Reading →

March 13, 2022 • Data engineering on the cloud

Data catalog services

Writing data processing jobs is a fascinating task. But it can't be worthless if the users can't find and use the generated data. Fortunately, we can count on data catalogs and leverage the power of metadata to overcome this discoverability issue.

Continue Reading →

March 6, 2022 • Data engineering on the cloud

Complex Event Processing on the cloud

When I've first met the Complex Event Processing (CEP) term, I was scared. Event streaming processing itself was complex enough, so why this extra complex-specific stuff? It happens that the complexity is real but in this post I will rather focus on a different aspect. What are the services supporting the CEP on the cloud?

Continue Reading →

February 6, 2022 • Data engineering on the cloud

Identities and permissions management

AWS was the first cloud provider I've been working on. That's why when I did my first Azure and GCP project, I was always asking myself, "Hey, how would you implement that on AWS?". Answering that question was easy most of the time, but sometimes I got stuck. One of my most significant issues was the identity and permissions management component. I will try to share some related answers in this blog post.

Continue Reading →

January 30, 2022 • Data engineering on the cloud

Data wrangling on the cloud

Data is not perfect, and in each project, you'll probably need to do some cleaning to prepare it for business use cases. To make this task easier, cloud providers have dedicated data wrangling services, and they'll be the topic of this blog post.

Continue Reading →

January 2, 2022 • Data engineering on the cloud

What's new on the cloud for data engineers - part 5 (09-12.2021)

It's time for the 5th part of the "What's new on the cloud for data engineers" series. This time I will cover the changes between September and December.

Continue Reading →

December 26, 2021 • Data engineering on the cloud

Schema management in cloud streaming services

When I tell you "schema management" and "streaming", you'll certainly think about the schema registry of Apache Kafka. That's true but also streaming cloud services do manage the schemas and in this blog post we'll see how.

Continue Reading →

Data engineering on the cloud articles