First steps with Apache Spark on K8s (part #1): preparation of environment

All you will read here is personal opinion or lack of knowledge :) Please feel free to contact me for fixing incorrect parts.

In order to start working with Spark on Kubernetes first step is to prepare sandbox. In my case I’m using Windows 10 based machine with Hyper-V. For this sandbox I can dedicate 16 Gigs and 8 threads.

As OS I will use Centos 7 Minimal:

The first step of prep. — install all packages which might be required for installation or configuration.

Let’s create dedicated user and connect to it, as we’re going to make all installation from it.

For the next step I will:

  • will add dnf repo
  • install docker
  • start the service
  • add firewall masquerade
  • will change mode for docker.sock

Mode change is needed in order to use docker socket.

if you have some issues with docker ps — make sure that the latest step with change mode is completed. More can be found at: https://stackoverflow.com/questions/48957195/how-to-fix-docker-got-permission-denied-issue

In the next step I will install conntrack.

Let’s install minikube now:

It’s almost there. Next is start the minikube!

It’s worth to check which addons are enabled. For such functionality like HPA metrics-server might be required

The final step in environment preparation — helm install:

Done!

--

--

Data Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store