Apache Spark 3.3.0 features articles

4-day workshop Β· In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that β€” unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants Β· production-ready templates
See the full curriculum β†’ €7,000 flat fee Β· cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

What's new in Apache Spark 3.3 - joins

Joins are probably the most popular operation for combining datasets and Apache Spark supports multiple types of them already! In the new release, the framework got 2 new strategies, the storage-partitioned and row-level runtime filters.

Continue Reading β†’

What's new in Apache Spark 3.3 - new functions

New Apache SQL functions are a regular position in my "What's new in Apache Spark..." series. Let's see what has changed in the most recent (3.3.0) release!

Continue Reading β†’

What's new in Apache Spark 3.3.0 - Data Source V2

After a break for the Data+AI Summit retrospective, it's time to return to Apache Spark 3.3.0 and see what changed for the DataSource V2 API.

Continue Reading β†’

What's new in Apache Spark 3.3.0 - Structured Streaming

Even though the Project Lightspeed is not there yet, Apache Spark Structured Streaming 3.3.0 has several interesting features that should make your daily life easier.

Continue Reading β†’

What's new in Apache Spark 3.3.0 - PySpark

It's time for the last "What's new in Apache Spark 3.3.0..." before a break. Today we'll see what changed in PySpark. Spoiler alert: Pandas users should find one feature very exciting!

Continue Reading β†’