Apache Spark 2.4.0 features articles

on waitingforcode.com
Articles tagged with Apache Spark 2.4.0 features. There are 7 article(s) corresponding to the tag Apache Spark 2.4.0 features. If you don't find what you're looking for, please check related tags: Apache Spark data sources, AWS EC2, Big Data patterns implemented, Change Data Capture, completable future, graph partitioning, horizontal scalability, hybrid orchestration and coordination, idempotent consumer, Kubernetes.

Apache Spark 2.4.0 features - barrier execution mode

Data-driven systems continuously change. We moved from static, batch-oriented daily processing jobs to real-time streaming-based pipelines running all the time. Nowadays, the workflows have more and more AI compontents. Apache Spark tries to stay in the movement and in the new release proposes the implementation of the barrier execution mode as a new way to schedule tasks. Continue Reading →

Apache Spark 2.4.0 features - Avro data source

Apache Avro became one of the serialization standards, among others because of its use in Apache Kafka's schema registry. Previously to work with Avro files with Apache Spark we needed Databrick's external package. But it's no longer the case starting from 2.4.0 release where Avro became first-class citizen data source. Continue Reading →