ChainerMN on Kubernetes with GPUs

By Shingo Omura
May 10, 2018
In General

Kubernetes is today the most popular open-source system for automating deployment, scaling, and management of containerized applications. As the rise of Kubernetes, bunch of companies are running Kubernetes as a platform for various workloads including web applications, databases, cronjobs and so on. Machine Learning workloads, including Deep Learning workloads, are not an exception even though such workloads require special hardwares like GPUs.

Kubernetes can schedule NVIDIA GPUs by default. So, single node Chainer workloads are straightforward. You can simply launch a Pod or a Job with nvidia.com/gpu resource request.

However running ChainerMN on Kubernetes is not straightforward because it requires us to setup an MPI cluster. Kubeflow can be a big help for it. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Please refer to helpful two slides below about Kubeflow which were presented on KubeCon + CloudNativeCon Europe 2018.

In this article, I would like to explain how to run ChainerMN workloads on Kubernetes with the help of Kubeflow.

How to run ChainerMN on Kubernetes

I explain it in three steps below:

Step 1. Build Your Container Image
Step 2. Install Kubeflow’s OpenMPI package
Step 3. Run ChainerMN on Kubernetes

Prerequisites

Kubernetes cluster equipped with Nvidia GPUs
on your local machine
- docker
- kubectl
- ksonnnet

Step 1. Build Your Container Image

First we need to build a container image to run your deep learning workload with ChainerMN. All we can just follow the official ChainerMN installation guides.

For Chainer/Cupy, official docker image chainer/chainer is available on DockerHub. This is very handy as a base image or runtime image for deep learning workloads because this image is already nvidia-docker ready.

Below is a sample Dockerfile to install CUDA aware OpenMPI, ChainerMN and its sample train_mnist.py script. Please save the contents with the name Dockerfile.

FROM chainer/chainer:v4.0.0-python3

ARG OPENMPI_VERSION="2.1.3"
ARG CHAINER_MN_VERSION="1.2.0"

# Install basic dependencies and locales
RUN apt-get update && apt-get install -yq --no-install-recommends \
      locales wget sudo ca-certificates ssh build-essential && \
    rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/* && \
    echo "en_US.UTF-8 UTF-8" > /etc/locale.gen && locale-gen

# Install OpenMPI with cuda
RUN cd /tmp && \
  wget -q https://www.open-mpi.org/software/ompi/v${OPENMPI_VERSION%\.*}/downloads/openmpi-$OPENMPI_VERSION.tar.bz2 && \
  tar -xjf openmpi-$OPENMPI_VERSION.tar.bz2 && \
  cd /tmp/openmpi-$OPENMPI_VERSION && \
  ./configure --prefix=/usr --with-cuda && make -j2 && make install && rm -r /tmp/openmpi-$OPENMPI_VERSION* && \
  ompi_info --parsable --all | grep -q "mpi_built_with_cuda_support:value:true"

# Install ChainerMN
RUN pip3 install chainermn==$CHAINER_MN_VERSION

# Download train_mnist.py example of ChainerMN
# In practice, you would download your codes here.
RUN mkdir -p /chainermn-examples/mnist && \
  cd /chainermn-examples/mnist && \
  wget https://raw.githubusercontent.com/chainer/chainermn/v${CHAINER_MN_VERSION}/examples/mnist/train_mnist.py

Then, you are ready to build and publish your container image.

# This takes some time (probably 10-15 min.). please enjoy ☕️.
docker build . -t YOUR_IMAGE_HERE
docker publish YOUR_IMAGE_HERE

Step 2. Install Kubeflow’s OpenMPI package

Kubeflow’s OpenMPI package in Kubeflow enables us launch OpenMPI cluster on Kubernetes very easily.

Actually, Kubeflow’s OpenMPI package have not been released officially. But it has been already available in master branch of Kubeflow repository. So, Let’s use it. Please note that this package is still in development mode.

Kubeflow depends on ksonnet. If you’re not faimiliar with ksonnet, I recommend you to follow their official tutorial.

Steps are very similar as discribed in Kubeflow’s OpenMPI package. I modified the original steps slightly because we have to use a specific commit of Kubeflow repository.

NOTE: If you faced rate limit errors of github api, please set up GITHUB_TOKEN as described here.

# Create a namespace for kubeflow deployment.
NAMESPACE=kubeflow
kubectl create namespace ${NAMESPACE}

# Generate one-time ssh keys used by Open MPI.
SECRET=openmpi-secret
mkdir -p .tmp
yes | ssh-keygen -N "" -f .tmp/id_rsa
kubectl delete secret ${SECRET} -n ${NAMESPACE} || true
kubectl create secret generic ${SECRET} -n ${NAMESPACE} --from-file=id_rsa=.tmp/id_rsa --from-file=id_rsa.pub=.tmp/id_rsa.pub --from-file=authorized_keys=.tmp/id_rsa.pub

# Which version of Kubeflow to use.
# For a list of releases refer to:
# https://github.com/kubeflow/kubeflow/releases
# (Specific commit hash is specified here.)
VERSION=e2fbf9e25e087eeb6ee1f9414526c6ed917c4bf9

# Initialize a ksonnet app. Set the namespace for it's default environment.
APP_NAME=chainermn-example
ks init ${APP_NAME}
cd ${APP_NAME}
ks env set default --namespace ${NAMESPACE}

# Install Kubeflow components.
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
ks pkg install kubeflow/openmpi@${VERSION}

Step 3. Run ChainerMN!

Now ready to run distributed train_mnist.py! According to standard ksonnet way, we firstly generate train_mnist component from openmpi prototype.

When generating a component, we can specify several parameters. In this example, we specify

train-mnist for its name,
4 workers,
1 GPU for each worker, and
launching mpiexec ... train_mnist.py scirpt for exec param

And then, ks apply command deploy our OpenMPI cluster on Kubernetes cluster.

Please be advised that this step requires an authorization to create service accounts and cluster role bindings for “view” cluster role. If you didn’t have such authorization, you will have to ask your administrator to create a service account which is granted ‘get’ verb for ‘pods’ resources. If such service account was ready, you then will set it to serviceAccountName param of train-mnist component.

# See the list of supported parameters.
ks prototype describe openmpi

# Generate openmpi components.
COMPONENT=train-mnist
IMAGE=YOUR_IMAGE_HERE
WORKERS=4
GPU=1
EXEC="mpiexec -n ${WORKERS} --hostfile /kubeflow/openmpi/assets/hostfile --allow-run-as-root --display-map -- python3 /chainermn-examples/mnist/train_mnist.py -g"
ks generate openmpi ${COMPONENT} --image ${IMAGE} --secret ${SECRET} --workers ${WORKERS} --gpu ${GPU} --exec "${EXEC}"

# Deploy to your cluster.
ks apply default

# Clean up, execute below two commands
# ks delete default
# kubectl delete secret ${SECRET}

This launches 1 master pod and 4 worker pods and some supplemental parts. Once train-mnist-master pod became Running state, training logs will be seen.

# Inspect pods status
# Wait until all pods are 'Running'
kubectl get pod -n ${NAMESPACE} -o wide

If all went good, our job progress will be seen on your terminal with kubectl logs!! It will show our deep learning jobs are distributed across 4 workers!

# Inspect training logs
kubectl logs -n ${NAMESPACE} -f ${COMPONENT}-master

This will show you training logs (I omitted several warning messages you can ignore)!!

...
========================   JOB MAP   ========================

Data for node: train-mnist-worker-0.train-mnist.kubeflow Num slots: 16   Max slots: 0    Num procs: 1
       Process OMPI jobid: [13015,1] App: 0 Process rank: 0 Bound: N/A

Data for node: train-mnist-worker-1.train-mnist.kubeflow Num slots: 16   Max slots: 0    Num procs: 1
       Process OMPI jobid: [13015,1] App: 0 Process rank: 1 Bound: N/A

Data for node: train-mnist-worker-2.train-mnist.kubeflow Num slots: 16   Max slots: 0    Num procs: 1
       Process OMPI jobid: [13015,1] App: 0 Process rank: 2 Bound: N/A

Data for node: train-mnist-worker-3.train-mnist.kubeflow Num slots: 16   Max slots: 0    Num procs: 1
       Process OMPI jobid: [13015,1] App: 0 Process rank: 3 Bound: N/A

=============================================================
==========================================
Num process (COMM_WORLD): 4
Using GPUs
Using hierarchical communicator
Num unit: 1000
Num Minibatch-size: 100
Num epoch: 20
==========================================
epoch       main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time
1           0.285947    0.106961              0.917333       0.9681                    16.6241
2           0.0870434   0.0882483             0.9736         0.9708                    23.0874
3           0.050553    0.0709311             0.9842         0.9781                    28.6014
...

About Chainer

Chainer is a Python-based, standalone open source framework for deep learning models. Chainer provides a flexible, intuitive, and high performance means of implementing a full range of deep learning models, including state-of-the-art models such as recurrent neural networks and variational autoencoders.

Learn More

Chainer Blog