It's time for the last "What's new in Apache Spark 3.3.0..." before a break. Today we'll see what changed in PySpark. Spoiler alert: Pandas users should find one feature very exciting!
Even though the Project Lightspeed is not there yet, Apache Spark Structured Streaming 3.3.0 has several interesting features that should make your daily life easier.
After a break for the Data+AI Summit retrospective, it's time to return to Apache Spark 3.3.0 and see what changed for the DataSource V2 API.
New Apache SQL functions are a regular position in my "What's new in Apache Spark..." series. Let's see what has changed in the most recent (3.3.0) release!
Joins are probably the most popular operation for combining datasets and Apache Spark supports multiple types of them already! In the new release, the framework got 2 new strategies, the storage-partitioned and row-level runtime filters.