Kubernetes concepts for Apache Spark

Versions: Apache Spark 3.2.0

I had the idea for this blog post when I was preparing the "What's new in Apache Spark..." series. At that time, I was writing about Kubernetes in the context of Apache Spark but needed to "google" a lot of things aside - mostly the Kubernetes API terms.

Watchers

Apache Spark interacts with Kubernetes API via Fabric8 library. It provides a user-friendly way to communicate with the cluster. The first term you can encounter in the code base is a Watcher. It's a listener-oriented interface implemented in Apache Spark as the following classes:

Executor pod snapshot store

One of the watches updates the executor pod snapshot store. It's represented by the ExecutorPodsSnapshotsStore interface and so far, it has a single implementation in the code base, the ExecutorPodsSnapshotsStoreImpl. It's adapted to the event-driven paradigm where the listeners read the information about the executor pods and trigger corresponding actions to update the store.

There are 2 snapshot events:

The information stored in the store comes from Fabric8's Pod class bringing among others, the information about: metadata, specification or status. Each Pod instance is wrapped into Apache Spark's ExecutorPodState being one of:

case class PodRunning(pod: Pod) extends ExecutorPodState
case class PodPending(pod: Pod) extends ExecutorPodState
sealed trait FinalPodState extends ExecutorPodState
case class PodSucceeded(pod: Pod) extends FinalPodState
case class PodFailed(pod: Pod) extends FinalPodState
case class PodDeleted(pod: Pod) extends FinalPodState
case class PodTerminating(pod: Pod) extends FinalPodState
case class PodUnknown(pod: Pod) extends ExecutorPodState

Pod template

The next item is a pure Kubernetes feature. A pod template is a way to generalize pod definitions. The feature has been available in Apache Spark 3.0 to respond to the increased needs for customization. Before this change, all new customization required adding a new configuration option to Apache Spark, meaning more code to maintain and a bigger gap with Kubernetes declarative model.

Pod template is a YAML definition (template) that can be reused by different jobs. One of the most self-explanatory examples I've found in the context of Apache Spark is the side container pattern running the executor alongside a monitoring container. You can find the full code example in this EMR documentation page.

Not all pod template properties can be overwritten in the Apache Spark context. Some of them, like restartPolicy (always never), volumes or service account, are always managed by Apache Spark. The full list is available in the Pod Template properties documentation.

Volumes

The pod storage is volatile because it disappears when the pod goes offline. To bring some persistence to the data, Kubernetes uses volumes that are also available for Apache Spark jobs. The framework supports the following Kubernetes volumes:

Apache Spark supports the volumes via the configuration entries starting with spark.kubernetes.driver.volumes or spark.kubernetes.executor.volumes.

Service accounts

The last interesting fact is about a service account. Why would an Apache Spark job ever need a service account? If you think about managed Apache Spark cloud services like EMR, you'll immediately understand the reason. An EMR cluster also needs a set of permissions. You will need it to get EC2 instances and create a new cluster, or to access data processing resources (S3, DynamoDB, Kinesis Data Streams, ...) used by the job.

Well, the service account in Apache Spark on Kubernetes is similar. It defines what the given pod can do. A driver pod will need the permissions to create and watch executor pods, service and config maps.

A service account won't be enough to authorize the driver to create executor pods. In addition to creating it, you will have to create a corresponding role or cluster role and bind it to the service account. Normal vs cluster resources have different scope. The latter apply cluster-wide, whereas the former work only for a namespace.

You will find the presented concepts in the video below:

The article started with 2 low-level implementation details of Kubernetes resource manager. You discovered that Apache Spark uses listeners to keep track of all the cluster changes. In the next 3 parts you saw more high-level concepts with the pod templates, various volume types and service accounts. I hope that altogether, it helped you get a better understanding of the Apache Spark on Kubernetes integration!