What's new on the cloud for data engineers - part 5 (09-12.2021)

It's time for the 5th part of the "What's new on the cloud for data engineers" series. This time I will cover the changes between September and December.

New ebook 🔥

Learn 84 ways to solve common data engineering problems with cloud services.

👉 I want my Early Access edition

I'm covering here the data services with some exceptions most often related to the security services. For the updates, I'm omitting the version upgrades which are quite frequent changes especially for the managed RDBMS services. This time, I'm also trying to highlight the most important features. The view is subjective, though.

AWS

Athena

Query execution:

ACID:

Governed tables

Governed tables are Glue Data Catalog tables that support ACID transactions and benefit from data layout optimizations, such as small files compaction.

Console:

Aurora

Database Activity Streams :

Database Activity Streams

An activity stream stores all change and access events. Change events represent data modification, such as INSERT or CREATE TABLE whereas access events represent data reads, such as SELECT statements.

Backup

New supported data stores:

Batch

Two new features in the AWS Batch service:

Data Exchange

New features:

Data Sync

Data Sync is the service for data synchronization between on-premise and cloud storage, or different cloud storage services. Recently it got some new connectors:

Database Migration Service

Source-related changes:

Target-related changes:

Other changes:

DocumentDB

DynamoDB

New features of DynamoDB:

EC2

Although it's not a pure data service, EC2 got a few interesting auto-scaling updates:

Besides, it also supports new instance types:

EMR

Serverless:

Cluster:

Security:

Studio:

EventBridge

A schema management feature:

Glue

Data Brew:

Crawlers:

FindMatches:

Jobs:

Kendra

There are 3 new features for this search engine:

Kinesis

Data Streams:

Firehose:

Lake Formation

Lambda

Triggers:

Graviton2 processor:

Security:

Ops:

Macie

Managed service for Prometheus

Although it's not a pure data service, it's worth noticing the general availability of the managed version of Prometheus.

MSK

Serverless:

Security:

Others:

Neptune

Some new features for the graph database:

Redshift

Serverless:

Client:

Data sharing:

Data types:

Other features:

S3

Events:

Storage classes:

Security:

S3 File Gateway :

S3 File Gateway

The gateway is a proxy providing access to virtually unlimited cloud storage from SMB and NFS protocols..

Other features:

OpenSearch

RDS

Oracle:

Global:

Snow Family

SNS

SQS

Timestream

Azure

Backup

Security:

Batch

Cosmos DB

Cassandra API:

Cost management:

Other updates:

Data Explorer

A lot of announcements with general availability promotion of the preview features:

Data Factory

Two updates:

Event Hubs

Two general availability announcements:

Functions

Some news for the serverless offering:

HDInsight

Two network and one API change:

Monitor

It's not a pure data change but it's important to notice the rename of action rules to alert processing rules.

Key Vault

Although it's not a pure data service, it got 2 important security updates:

Managed Instance for Apache Cassandra

The service providing a managed infrastructure for Apache Cassandra is now generally available.

Purview

Service Bus

SQL Database

Hyperscale:

Database for PostgreSQL - Flexible Servier:

Database for MySQL:

SQL Managed Instance:

Storage Account

Security:

Other features:

Stream Analytics

A few changes for High Availability and supported target data stores:

Synapse

Several changes for the service:

GCP

BigQuery

BigQuery Omni, a multi-cloud analytics solution, is now generally available.

Administration:

IO:

SQL:

Security:

Pricing:

BigQuery Transfer Service

Cloud Functions

Cloud SQL

SQL Server:

MySQL:

PostgreSQL:

Global:

Cloud Storage

Features:

Security:

Others:

Data Fusion

Security:

Ops:

Other features:

Data Loss Protection

New detectors and connections:

Dataflow

Dataflow Prime is available in preview. This new runtime environment executes Apache Beam pipelines in a serverless mode. Apart from that, there are 2 other announcements:

Dataproc

Datastream

Firestore

IAM

Two interesting changes for this security service:

Pub/Sub

Spanner

Ops:

Other features:

Spark on Google Cloud

The autoscaling serverless Spark service integrated with other GCP offerings. It's still in a Private Preview.

Storage Transfer Service

This time too, a lot of exciting news for the ones of us who are working on the cloud. If I had to pick the most important announcement, I would definitively remember all the serverless changes for streaming brokers, data processing and data warehouse services.

If you liked it, you should read:

The comments are moderated. I publish them when I answer, so don't worry if you don't see yours immediately :)

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!