Looking for something else? Check the categories of Cloud:
Data engineering on AWS Data engineering on Azure Data engineering on GCP Data engineering on the cloud
If not, below you can find all articles belonging to Cloud.
It's time for another part of "What's new on the cloud for data engineers". Let's see what happened in the last 5 months.
The "vertical scaling" has caught my attention a few times already when I have been reading about cloud updates. I've always considered horizontal scaling as the single true scaling policy for elastic data processing pipelines. Have I been wrong?
It's time for another part of "What's new on the cloud for data engineers". Let's see what happened in the last 4 months.
Lessons learned why it's always worth checking the code implementation to avoid surprises later. Even for vendor-supported solutions.
It's time for another part of "What's new on the cloud for data engineers". Let's see what happened in the last 3 months.
I have used to say "Kinesis Data Streams is like Apache Kafka, an append-only streaming broker with partitions and offsets". Although often it's true, it's not that simple unfortunately.
Open Source tools helped me switch to the cloud world a lot. The managed cloud services often share the same fundamentals as their Open alternatives. However, there is always something different. Today I'll focus on these differences for Amazon Kinesis service and Apache Kafka ecosystem.
Have you missed any cloud data engineering-related news in the last 3 months? No worries, I got you covered with the new part of the "What's new on the cloud for data engineers..." series.
There is always a gap between a disruption in the data engineering industry and its integration on the cloud. It was not different for table file formats which have started gaining interest on AWS, Azure, GCP recently.
It's the last update on the data engineering news on the cloud this year. There are a lot of things coming out. Especially for the streaming processing!
Setting a data processing layer up has several phases. You need to write the job, define the infrastructure, CI/CD pipeline, integrate with the data orchestration layer, ... and finally, ensure the job can access the relevant datasets. The most basic authentication mechanism uses login/password pair but can we do better on the cloud? Let's see!
I've discovered the term from the title while learning Azure Synapse and Cosmos DB services. I had heard of NoSQL, or even NewSQL, but never of a solution supporting analytical and transactional workloads at once.
Four months in cloud history is a huge period of time. Even when 2 of the 4 months are the usual "holiday" months. As you can guess from the title, it's time to see what changed recently on the cloud from a data engineering perspective!
When I prepare the "What's new on the cloud..." series, I'm pretty sure that for Azure the most updates will go to the Azure SQL service. The main idea of the service is simple but if you analyze it more deeply, you'll find some concepts that might not be the easiest to understand at first.
It's time for the first cloud news blog post this year. The update summary lists all changes of data or data-related services between January 1 and April 25.
Data ingestion is the starting point for all data systems. It can work in batch or streaming mode. I've recently covered the batch ingestion pretty much already with previous blog posts but I haven't done anything for the streaming, yet. Until today when you can read a few words about HTTP-based data ingestion to cloud streaming brokers.
Data is a live being. It's getting queried, written, overwritten, backfilled and ... migrated. Since the last point is the least obvious from the list, I've recently spent some time trying to understand it better in the context of the cloud.
Data migration is one of the scenarios you can face as a data engineer. It's not always an easy task but managed cloud services can help you to put in place the pipeline and solve many common problems.
The volume of the data to migrate from an on-premise to a cloud environment will probably be less significant than previous years since a lot of organizations are already on the cloud. However, it's interesting to see different methods to bring the data there and that's something I'll show you in this blog post.
Writing data processing jobs is a fascinating task. But it can't be worthless if the users can't find and use the generated data. Fortunately, we can count on data catalogs and leverage the power of metadata to overcome this discoverability issue.