What's new on the cloud for data engineers - part 10 (03-05.2023)

It's time for another part of "What's new on the cloud for data engineers". Let's see what happened in the last 3 months.

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants · production-ready templates
See the full curriculum → €7,000 flat fee · cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

This 10th part covers all that happened between 10.03.2023 and 27.05.2023. As previously, I highlighted the most interesting news.

AWS

Athena

Aurora

Backup

Batch

Data Sync

Database Migration Service

DocumentDB

DynamoDB

ElastiCache

EMR

Kubernetes:

Security:

Serverless:

Others:

Glue

Crawlers:

Studio:

Others:

Kendra

Keyspaces

Kinesis

Firehose:

Lake Formation

Lambda

Processing:

Ops/Others:

MSK

Others:

Neptune

MemoryDB

Neptune

OpenSearch

RDS

SQL Server:

MySQL:

PostgreSQL:

Global:

Redshift

Others:

S3

Security:

Other features:

SNS

Timestream

QuickSight

Azure

Backup

Batch

Cache for Redis

Containers apps

Cosmos DB

MongoDB:

PostgreSQL

NoSQL

Misc

Data Explorer

Database Migration

Databricks

Event Grid

Event Hubs

Fabric

It's a new end-to-end, unified analytics service on Azure. It integrates other Azure technologies, including Azure Data Factory, Azure Synapse Analytics, and Power BI, into a single unified product. It has 7 different workloads that you can use for various use cases, such as real-time analytics, data orchestration, or data science.

Functions

Monitor

Purview

SQL Database

Hyperscale:

PostgreSQL:

SQL Managed Instance:

MySQL:

SQL Server on VM:

Security:

Misc:

Storage Account

Security:

Misc:

Storage Mover

Stream Analytics

Synapse

GCP

BigQuery

Administration/OPS:

IO:

SQL:

Security:

Other features:

Streaming:

Machine Learning:

BigQuery Transfer Service

Cloud Composer

Cloud Composer 2:

Security:

Bug fixes:

Others:

Cloud Functions

Cloud SQL

SQL Server:

MySQL:

PostgreSQL:

PostgreSQL and MySQL:

Global:

Cloud Storage

Data Loss Protection

New detectors and connections:

Other changes:

Dataflow

Dataplex

Dataproc

Datastream

Firestore

IAM

Pub/Sub

Spanner

Other features:

Querying:

Storage Transfer Service

Data Fabric is the most impactful and changes from the list. Unified set of other services to simplify cloud data stack sounds great! Besides, there are other smaller but also interesting changes, such as Vertical auto scaling for EMR on Kubernetes, Kafka Connect GA in Event Hubs, or GA lineage and CDC support in BigQuery!

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems contact@waitingforcode.com đź“©