What's new on the cloud for data engineers - part 8 (09-12.2022)

It's the last update on the data engineering news on the cloud this year. There are a lot of things coming out. Especially for the streaming processing!

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I'm currently writing one on that topic and the first chapters are already available in 👉 Early Release on the O'Reilly platform

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

The update covers updates until December 17th. The remaining 2 weeks will be included in the first update blog post next year. As usual, I'm trying to present an opinionated list here with the most important updates highlighted in bold.

AWS

Athena

Performance:

Integration:

Misc:

Aurora

Backup

Batch

CloudWatch

Data Exchange

Data Sync

Database Migration Service

DocumentDB

DynamoDB

EC2

Although it's not a pure data service, EC2 got a few interesting updates if you run a data workload on it:

EMR

Kubernetes:

Serverless:

Others:

EventBridge

Fault Injection Simulator

Although it's not a pure data service, it's good to know it to check the reliability of your infrastructure.

Glue

Crawlers:

Others:

Kinesis

Firehose:

Data Analytics:

Lake Formation

Lambda

Processing:

Security:

Ops/Others:

Managed Grafana

Managed Workflows for Apache Airflow

MemoryDB

MSK

Connect:

Serverless:

Others:

Neptune

Some new features for the graph database:

OpenSearch

RDS

Oracle:

SQL Server:

MySQL:

PostgreSQL:

Global:

Redshift

Integration:

Others:

S3

Storage classes:

Outpost:

Security:

Other features:

Security Lake

It's a new service and for now, it's available in Preview. Amazon Security Lake simplifies collecting and analyzing security data sources from AWS CloudTrail management events, Amazon Virtual Private Cloud (Amazon VPC) Flow Logs, Amazon Route 53 Resolver query logs, and AWS Security Hub.

SNS

SQS

Step Functions

Storage Gateway

Transfer Family

QuickSight

QuickSight Q:

Misc:

Azure

Backup

Cache for Redis

Cosmos DB

MongoDB:

PostgreSQL

Cassandra:

Misc:

Data Explorer

Database Migration

Event Hubs

Functions

Monitor

Service Bus

SQL Database

Hyperscale:

PostgreSQL - Flexible Server:

MySQL - Flexible Server:

SQL Managed Instance:

Misc:

Storage Account

Security:

MISC

Stream Analytics

Synapse

Several changes for the service:

GCP

BigQuery

Administration/OPS:

IO:

SQL:

Security:

Omni:

Other features:

BigQuery Transfer Service

Cloud Composer

Cloud Composer 2:

Security:

Some of the bug fixes:

Others:

Cloud Functions

Cloud SQL

SQL Server:

MySQL:

PostgreSQL:

PostgreSQL and MySQL:

Global:

Cloud Storage

Security:

Misc:

Data Catalog

Data Fusion

Data Loss Protection

New detectors and connections:

Dataflow

Dataplex

Dataproc

Datastream

Firestore

IAM

Pub/Sub

Spanner

Change Data records:

Performance:

Querying:

Security:

Other features:

Storage Transfer Service

As usual, I highlighted my top picks in the most recent release. I can see a lot of work on the streaming processing. The list of updates for Azure Stream Analytics has never been so long! AWS Athena can now query MSK and Redshift get data directly from MSK or Kinesis Data Streams! Besides, I also have a feeling that the cloud providers are starting to consider open table formats (Apache Iceberg, Apache Hudi, Delta Lake) more and more seriously. The added support for Apache Iceberg and BigLake changes announce some other major changes in 2023. And finally, I also have a feeling of some deja vu. The zero-ETL making Aurora data available on Redshift looks very similar to the Azure Synapse Link and a more general concept called Hybrid Transactional/Analytical Processing, doesn't it?


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!