First steps with Apache Spark on K8s (part #3): GCP Spark Operator and auto scaling

All you will read here is personal opinion or lack of knowledge :) Please feel free to contact me for fixing incorrect parts.

In previous blogposts I have described how to setup environment for Kubernetes and how to auto scale Standalone Spark. In this post let’s have a look on GCP Spark Operator. Join my journey with K8s and Spark! Operator versions shown below:

Installation is pretty simple, just use helm. I’m not providing namespace as using default one:

Double check if deployment was successful:

To understand how horizontal auto scaling works in GCP Spark Operator I will use several scripts:

As the first step service account, cluster role binding and then operator should be created:

Horizontal auto scaling

Check example application:

What is the difference between provided example and this small dummy app?

The dynamicAllocation block is added in specification, which describes preferences of scaling. This block actually describes horizontal auto scaling parameters. In this case it’s set from 2 to 10 executors as required. Also arguments block is added which is used for input parameters. 10 — is relatively small random sample input parameter for this app, so it will scale, but not to the maximum replicas.

Most probably the best tool which grafically represents PODs scaling is Minikube dashboard, but in my case I will use simple console:

Next app is totally the same, just with input argument set to 100000 which gives a bit more load :)

Vertical auto scaling

There is no such term as vertical auto scaling in K8s world, but my understanding and personal opinion is that POD can scale inside itself with provided resource and resource limits. Probably that might be possible only with operators:

I haven’t found option to set “vertical” scaling in executor POD — something like cores/coresLimit and memory/memoryLimit. Some similar configuration items exists, but actually they represents different functionality.

Conclusion: horizontal auto scaling of Spark workers with GCP Spark operator is pretty obvious and simple. There were not HPA object type created during scaling — so that is a bit strange. Overall it’s great user experience, as it has decent documentation and wide community. Personally — love it!

Data Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store