Cloud articles

Looking for something else? Check the categories of Cloud:

Data engineering on AWS Data engineering on Azure Data engineering on GCP Data engineering on the cloud

If not, below you can find all articles belonging to Cloud.

What's new on the cloud for data engineers - part 7 (05-08.2022)

Four months in cloud history is a huge period of time. Even when 2 of the 4 months are the usual "holiday" months. As you can guess from the title, it's time to see what changed recently on the cloud from a data engineering perspective!

Continue Reading →

Shedding some light on Azure SQL

When I prepare the "What's new on the cloud..." series, I'm pretty sure that for Azure the most updates will go to the Azure SQL service. The main idea of the service is simple but if you analyze it more deeply, you'll find some concepts that might not be the easiest to understand at first.

Continue Reading →

What's new on the cloud for data engineers - part 6 (01-04.2022)

It's time for the first cloud news blog post this year. The update summary lists all changes of data or data-related services between January 1 and April 25.

Continue Reading →

HTTP-based data ingestion to streaming brokers

Data ingestion is the starting point for all data systems. It can work in batch or streaming mode. I've recently covered the batch ingestion pretty much already with previous blog posts but I haven't done anything for the streaming, yet. Until today when you can read a few words about HTTP-based data ingestion to cloud streaming brokers.

Continue Reading →

Data migration on the cloud

Data is a live being. It's getting queried, written, overwritten, backfilled and ... migrated. Since the last point is the least obvious from the list, I've recently spent some time trying to understand it better in the context of the cloud.

Continue Reading →

Database management services on the cloud

Data migration is one of the scenarios you can face as a data engineer. It's not always an easy task but managed cloud services can help you to put in place the pipeline and solve many common problems.

Continue Reading →

Data ingestion to the cloud object store

The volume of the data to migrate from an on-premise to a cloud environment will probably be less significant than previous years since a lot of organizations are already on the cloud. However, it's interesting to see different methods to bring the data there and that's something I'll show you in this blog post.

Continue Reading →

Data catalog services

Writing data processing jobs is a fascinating task. But it can't be worthless if the users can't find and use the generated data. Fortunately, we can count on data catalogs and leverage the power of metadata to overcome this discoverability issue.

Continue Reading →

Complex Event Processing on the cloud

When I've first met the Complex Event Processing (CEP) term, I was scared. Event streaming processing itself was complex enough, so why this extra complex-specific stuff? It happens that the complexity is real but in this post I will rather focus on a different aspect. What are the services supporting the CEP on the cloud?

Continue Reading →

Identities and permissions management

AWS was the first cloud provider I've been working on. That's why when I did my first Azure and GCP project, I was always asking myself, "Hey, how would you implement that on AWS?". Answering that question was easy most of the time, but sometimes I got stuck. One of my most significant issues was the identity and permissions management component. I will try to share some related answers in this blog post.

Continue Reading →

Data wrangling on the cloud

Data is not perfect, and in each project, you'll probably need to do some cleaning to prepare it for business use cases. To make this task easier, cloud providers have dedicated data wrangling services, and they'll be the topic of this blog post.

Continue Reading →

Serverless MapReduce?

Is it possible to implement the MapReduce paradigm on top of cloud serverless functions? Technically yes and there are some reference architectures I'm gonna discuss in this blog post. Is it a good idea? It depends on the context and hopefully you'll be able to figure out the answer after reading my thoughts.

Continue Reading →

What's new on the cloud for data engineers - part 5 (09-12.2021)

It's time for the 5th part of the "What's new on the cloud for data engineers" series. This time I will cover the changes between September and December.

Continue Reading →

Schema management in cloud streaming services

When I tell you "schema management" and "streaming", you'll certainly think about the schema registry of Apache Kafka. That's true but also streaming cloud services do manage the schemas and in this blog post we'll see how.

Continue Reading →

Testing streaming data systems on the cloud - ideas

That's one of the biggest problems I've faced in my whole career. The development environment! I'm not talking here about creating cloud resources in different subscription but about the environment sharing similar characteristics to the production. In the blog post I'll share with you different strategies to put in place in the context of the cloud and streaming applications.

Continue Reading →

Scaling data processing on the cloud

Processing static datasets is easier than dynamic ones that may change in time. Hopefully, cloud services offer various more and less manual features to scale the data processing logic. We'll see some of them in this blog post.

Continue Reading →

Hush! It's a secret on the cloud

How to manage secrets is probably one of the first problems you may encounter while deploying some resources from a CI/CD pipeline. The simple answer is: not manage them at all! Let the cloud services do this.

Continue Reading →

Data sharing on the cloud

One of the big announcements of the previous Data+AI Summit was Delta Sharing, a protocol to exchange the life data with internal and external users. The question I asked myself at that moment was "Does it exist on the cloud?". Let's see.

Continue Reading →

Data orchestration on the cloud

When it comes to executing one isolated job, there are many choices and using a data orchestrator is not always necessary. However, it doesn't apply to the opposite scenario where a data orchestrator not only orchestrates the workload but also provides a monitoring layer. And the question arises, what to do on the cloud?

Continue Reading →

Time travel on the cloud

I've first heard about the time travel feature with Delta Lake. But after digging a bit, I've found that it's not a pure Delta Lake concept! In this blog post I will show you what cloud services implement it too.

Continue Reading →