Apache Spark scalability articles

on waitingforcode.com
Articles tagged with Apache Spark scalability. There are 3 article(s) corresponding to the tag Apache Spark scalability. If you don't find what you're looking for, please check related tags: Apache Spark 2.4.0 features, Apache Spark data sources, Apache Spark elasticity, Apache Spark internals, Apache Spark Structured Streaming joins, AWS EC2, Big Data patterns implemented, Bloom filters, completable future, data locality.

Apache Spark on Kubernetes - global overview

Last years are the symbol of popularization of Kubernetes. Thanks to its replication and scalability properties it's more and more often used in distributed architectures. Apache Spark, through a special group of work, integrates Kubernetes steadily. In current (2.3.1) version this new method to schedule jobs is integrated in the project as experimental feature. Continue Reading →

What Kubernetes can bring to Apache Spark pipelines ?

Commercial version of Apache Spark distributed by Databricks offers a serverless and auto-scalable approach for the applications written in this framework. Among the time some other companies tried to provide similar alternatives, going even to put Apache Spark pipelines into AWS Lambda functions. But with the version 2.3.0 another alternative appears as a solution for scalability and elasticity overhead - Kubernetes. Continue Reading →