Back to Blog Home

What is Kubeflow ? Machine Learning Meets Kubernetes

Ishaan Bhola| Machine Learning | 7 months, 1 week



 

Machine Learning has off late reached a threshold level where it’s applicability outweighs it’s initial setting up complexity to take it off the ground. Old and new machine learning libraries have become quite mature and stable, but still part of the reason that makes working with Machine Learning so hard in it’s current form is the inherent complexity in deploying your ML models into cloud or production. 

 

Testing a ML model in your laptop is one thing and building a production ready machine learning system is a different ball game altogether as it involves creating solution from off the shelf products from vendors and custom created components.This complexity in managing these services even in a mid level ml setup creates huge barriers of entry to ML. Also these stacks are so tightly coupled to the clusters that they are almost immobile in terms of transitioning them from local environment to production setup. 

 

On the other hand Kubernetes has made deploying complicated stacks anywhere extremely easier with its containers orchestration and scaling technology. And since Kubernetes had always been big on extensibility, people have customised it massively to do various custom stuff including deploying their machine learning models. 

 

What is Kubeflow 

 

 

Enter Kubeflow. Kubeflow intends to make deployment of Machine Learning models to any infrastructure easier. It can make your ML learning workflows really simple, highly scalable and more importantly portable. It doesn’t focus on building a new ML library but rather provide world class support to best of breed ML frameworks and provide them the infrastructure and ops love they need. 

 

Kubeflow comes with a simple mission to make it really easier to scale and deploy ML models by leveraging the best things about Kubernetes tech. 

 

Advantages of Kubeflow :

 

  • It can help you scale your ml models on demand
  • Help you manage your microservices for your ML setup
  • Help you deploy to any infrastructure whether it is your laptop, training cluster or prod cluster

 

Kubeflow is opensource and is heavily in development with new capabilities and use cases being added consistently by creating additional tooling support.

 

Some of existing use cases where Kubeflow can really move the needle includes :

 

  • If you want to use Jupyter notebooks for managing TensorFlow training jobs and you want to extend that to more outside computing power (extra cpus and gpus)
  • You want to integrate TensorFlow with other services and processes
  • IF you want to deploy TF models in different environments like local, cloud or on prem. 

 

Although Kubeflow was built with TensorFlow in mind, but it soon plans to add support for other ML frameworks like PyTorch, XGBoost, scikit-learn etc.

 

Using Kubeflow:

 

Let’s suppose you are working with two different Kubernetes clusters: a local minikube cluster; and a GKE cluster with GPUs; and that you have two kubectl contexts defined named minikube and gke.

First we need to initialize ksonnet application and install the Kubeflow packages. (To use ksonnet, you must first install it on your operating system - the instructions for doing so are here)

 

     ks init my-kubeflow  
     cd my-kubeflow  
     ks registry add kubeflow \  
     github.com/google/kubeflow/tree/master/kubeflow  
     ks pkg install kubeflow/core  
     ks pkg install kubeflow/tf-serving  
     ks pkg install kubeflow/tf-job  
     ks generate core kubeflow-core --name=kubeflow-core

 

We can now define environments corresponding to our two clusters.

 

     kubectl config use-context minikube  
     ks env add minikube  

     kubectl config use-context gke  
     ks env add gke  

 

And we’re done! Now just create the environments on your cluster. First, on minikube:

 

     ks apply minikube -c kubeflow-core  

 

And to create it on our multi-node GKE cluster for quicker training:

 

     ks apply gke -c kubeflow-core  

 

By making it easy to deploy the same rich ML stack everywhere, the drift and rewriting between these environments is kept to a minimum.

To access either deployments, you can execute the following command:

 

     kubectl port-forward tf-hub-0 8100:8000  

 

and then open up http://127.0.0.1:8100 to access JupyterHub. To change the environment used by kubectl, use either of these commands:

 

     # To access minikube  
     kubectl config use-context minikube  

     # To access GKE  
     kubectl config use-context gke  

 

When you execute apply you are launching on K8s

  • JupyterHub for launching and managing Jupyter notebooks on K8s
  • A TF CRD

 

Let’s suppose you want to submit a training job. Kubeflow provides ksonnet prototypes that make it easy to define components. The tf-job prototype makes it easy to create a job for your code but for this example, we’ll use the tf-cnn prototype which runs TensorFlow’s CNN benchmark.

 

To submit a training job, you first generate a new job from a prototype:

 

     ks generate tf-cnn cnn --name=cnn  

 

By default the tf-cnn prototype uses 1 worker and no GPUs which is perfect for minikube cluster so we can just submit it.

 

     ks apply minikube -c cnn

 

On GKE, we’ll want to tweak the prototype to take advantage of the multiple nodes and GPUs. First, let’s list all the parameters available:

 

     # To see a list of parameters  
     ks prototype list tf-job  

 

Now let’s adjust the parameters to take advantage of GPUs and access to multiple nodes.

 

     ks param set --env=gke cnn num\_gpus 1  
     ks param set --env=gke cnn num\_workers 1  

     ks apply gke -c cnn  

 

Note how we set those parameters so they are used only when you deploy to GKE. Your minikube parameters are unchanged!

After training, you export your model to a serving location.

Kubeflow also includes a serving package as well. In a separate example, we trained a standard Inception model, and stored the trained model in a bucket we’ve created called ‘gs://kubeflow-models’ with the path ‘/inception’.

 

To deploy a the trained model for serving, execute the following:

 

     ks generate tf-serving inception --name=inception  
     ---namespace=default --model\_path=gs://kubeflow-models/inception  
     ks apply gke -c inception  

 

This highlights one more option in Kubeflow - the ability to pass in inputs based on your deployment. This command creates a tf-serving service on the GKE cluster, and makes it available to your application.


 



Join 1000+ People Who Subscribe to Weekly Blog Updates

Back to Blog Home