What's new on the cloud for data engineers - part 6 (01-04.2022)

It's time for the first cloud news blog post this year. The update summary lists all changes of data or data-related services between January 1 and April 25.

Looking for a better data engineering position and skills?

You have been working as a data engineer but feel stuck? You don't have any new challenges and are still writing the same jobs all over again? You have now different options. You can try to look for a new job, now or later, or learn from the others! "Become a Better Data Engineer" initiative is one of these places where you can find online learning resources where the theory meets the practice. They will help you prepare maybe for the next job, or at least, improve your current skillset without looking for something else.

👉 I'm interested in improving my data engineering skillset

See you there, Bartosz

I'm covering here the data services with some exceptions most often related to the security services. For the updates, I'm omitting the version upgrades which are quite frequent changes especially for the managed RDBMS services. This time, I'm also trying to highlight the most important features. The view is subjective, though.

AWS

Athena

Aurora

Batch

Data Exchange

DocumentDB

DynamoDB

EC2

Although it's not a pure data service, EC2 got a few interesting auto-scaling updates:

Warm Pools

It's a pool of pre-initialized EC2 instances sitting alongside an Auto Scaling group. It helps decreasing the latency for the applications having exceptionally long boot times, for example, because of writing massive amounts of data to disk.

Besides, it also supports new instance types:

EMR

Quite a lot updates for Kubernetes:

ACID file formats:

Lake Formation:

Studio:

Scaling:

EventBridge

FSx

Although it's not a pure data service, one of its recent updates brings something good for databases:

Glue

Data Brew:

FindMatches:

Jobs:

Schema registry:

Kendra

There are 3 new features for this search engine:

Keyspaces

Two changes for the clients:

Kinesis

Two new sinks are available in Firehose:

Lambda

Processing:

Security:

Ops/Others:

Macie

Managed data identifier

The service supports different managed data identifiers. Each of them designed to detect a specific type of sensitive data, such as credit card numbers, or passport numbers.

MSK

Performance:

Security:

Ops:

Redshift

Spectrum:

Data sharing:

Data types and language:

I/O:

Security:

Others:

S3

Security:

MemoryDB

Neptune

RDS

Oracle:

SQL Server:

MySQL:

PostgreSQL:

Global:

SNS

Step Functions

Storage Gateway

QuickSight

Azure

Backup

Batch

Cache for Redis

Cosmos DB

Synapse Link:

Security:

Misc:

Data Explorer

Data Box

NOT

Data Factory

Databricks

Functions

Some news for the serverless offering:

Monitor

Although it's not a pure data service, you might find some changes indirectly related to the data services:

Unsupported tables

Azure Monitor represents collected logs as tables. The supported tables are the one currently available for exploration and export rules. An example of that type is DatabricksWorkspace which is an audit log for Databricks Workspace.

The unsupported tables, albeit coming from real Azure resources, are not ready yet for querying. For example, a Perf table stores performance counters from Windows and Linux agents. However, only Windows data can be used in the export rules.

Key Vault

Although it's not a pure data service, it has an important quotas update:

Purview

SQL Database

Hyperscale:

PostgreSQL:

SQL Managed Instance:

SQL Server on VM:

Security:

Misc:

Storage Account

Monitoring:

Misc:

Table Storage:

Stream Analytics

Synapse

GCP

BigQuery

Two major announcements for BigQuery:

SQL:

Other features:

Cloud Composer

Cloud Composer 2:

Bug fixes for:

Others:

Cloud Functions

Cloud SQL

SQL Server:

PostgreSQL:

Global:

Cloud Storage

Security:

Others

Data Catalog

Data Fusion

Data Loss Protection

New detectors and connections:

Dataflow

Dataplex

Dataproc

Firestore

IAM

Two interesting changes for this security service:

Pub/Sub

Spanner

Performance:

Other features:

Storage Transfer Service

The data innovation on the cloud is in progress. My top news are Apache Iceberg in AWS, Kubernetes support on EMR, new data services (Dataplex, BigLake, Analytics Hub) on GCP, optimized runtime environments (Dataflow Runner V2, Dataproc Serverless), and integrated ML capabilities (Stream Analytics). What's are yours?


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!