Vertical autoscaling for data processing on the cloud

Versions: https://www.cncf.io/blog/2023/02/24/optimizing-kubernetes-vertical-pod-autoscaler-responsiveness/::Optimizing Kubernetes Vertical Pod Autoscaler responsiveness https://aws.amazon.com/about-aws/whats-new/2023/05/amazon-emr-eks-vertical-auto-scaling/::Amazon EMR on EKS launches vertical autoscaling to auto-tune application resources https://cloud.google.com/dataflow/docs/vertical-autoscaling::Vertical Autoscaling

The "vertical scaling" has caught my attention a few times already when I have been reading about cloud updates. I've always considered horizontal scaling as the single true scaling policy for elastic data processing pipelines. Have I been wrong?

Old days

Horizontal and vertical scalings have different focus. The former optimizes the workload by adding extra nodes to the cluster. It's a good candidate if you scaled your data source (e.g. added new partitions) or got more data to read (e.g. more files for unpartitioned data sources) and need some extra compute power to handle that extra load. Vertical scaling is different. It doesn't change the number of nodes but their type. Typically, if your job uses only 50% of allocated memory, is close to these allocation limits, or struggles in doing asynchronous computations, adding extra machines can help only when the workload can be distributed. If not, they will remain idle and you will need a different strategy, the vertical one.

So far the single way to modify the composition of the running clusters has consisted of stopping them, changing the underlying node types, and restarting. As you can imagine, it involves downtime. But this is the past!

Vertical autoscaling

Modern data processing offerings, including EMR, Databricks, Dataflow (?, still unsure about that, compute environment), use under-the-hood Kubernetes. And Kubernetes has a great autoscaling support. There is a Horizontal Pod Autoscaler for the horizontal scaling. There is a Cluster Autoscaler to automatically adapt the cluster capacity to the scaling needs. Finally, there is also a Vertical Pod Autoscaler (VPA) that supports vertical approach.

How does it work? VPA analyzes historical resource utilization, amount of resources available in the cluster and real-time events, such as OOMs. It uses the analysis outcome to adjust the pod requirements in real-time with the goal to improve the overall cluster utilization.

The VPA architecture has 3 main components:

The Updater supports different modes for resource management via an updatePolicy configuration. For the "Initial" setting, it can only overwrite the resources at the pod's creation time. When you need to extend this behavior by the runtime adjustments, you can set the policy to "Auto". But if you need only the resource recommendations without the update action, you can define the policy to "Off".

The update consists of replacing the pods, so shutting down the old ones and creating their replacements. It may then involve some downtime but the impact should be weaker than for the "Old days" method.

Vertical autoscaling and data processing

I mentioned the VPA not without reason. AWS EMR on EKS uses it in the Vertical autoscaling feature. All you need to do besides installing VPA of course, is to add these 2 configurations to the driver while submitting the job:

"spark.kubernetes.driver.label.emr-containers.amazonaws.com/dynamic.sizing":"true",
"spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.signature":"job1"

The former parameter enables or disables the dynamic sizing while the latter provides a unique id for the job resources. You can also extend the configuration with more VPA-related entries and hence, limit the memory or CPU that can be allocated, or configure the VPA updatePolicy.

Besides EMR on EKS, there is also another service supporting vertical autoscaling. GCP Dataflow Prime uses the feature to fine-tune the memory usage and avoids OOM errors. The feature replaces existing workers with the new ones that are more adapted to the real needs. Each vertical auto scaling deactivates the horizontal autoscaling for 10 minutes. Unlike the VPA, the feature on Dataflow impacts the memory only.

Relocating your workers to more efficient nodes is a great cost saving and workload optimization feature. However it seems both features are not task-aware and may replace the worker when it's running a task.


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!