From Notebook To Production Part 3

9 minute read

In part 2 we used Docker to run a Flask Application in a container. In this post we will use Kubernetes to host and scale that application, making the API hosted in the app available to anybody who wants to use it.

Put your code in a GitLab Repository.

Go to GitLab.com and create an account if you do not already have one.

Once you are logged in, create a new project: png

Clone the repository to your computer:

replace <USERNAME> with your Gitlab Username
git clone git@gitlab.com:<USERNAME>/notebook-to-production.git

Copy the application files into the folder. Add them to Git with git add ., then commit and push the code to your GitLab Repo:

git add.
git commit -m "Initial Commit"
git push 

png

Create a Kubernetes Cluster in the Digital Ocean Cloud

Following these steps on Digital Ocean will cost money

Create a Digital Ocean account and create a new project with your account.

Create Cluster

Inside of the new project, add a Kubernetes cluster:

png

It will take a while for your Cluster to Initialize.

png

Install Metrics Server

We will need the metrics server to use Horizontal Pod Auto-scaling later on.

After your cluster is done initializing, select Marketplace from the menu on the left.

Search the marketplace for “Kubernetes Metrics Server”:

metrics server 1

Select Install App then Install on Existing Cluster

metrics server 2

Select your cluster and install it.

This will install the metrics server on your Kubernetes Cluster.

metrics server 3

Install kubectl

kubectl is the command-line tool for accessing your Kubernetes cluster and using the Kubernetes API. The official documentation from Google has great directions:

Install Directions for Kubectl

Download the kubeconfig

The kubeconfig is the configuration file that tells Kubectl how to talk to your Kubernetes cluster. To download the config file go to your Digital Ocean Kubernetes page:

config 1

Select Download Config and follow the instructions in the Quick connect with manual certificate management paragraph.

config 2

The output for the command should look similar to this:

config_3

Now that we have kubectl installed and using the right config, we will integrate it with our GitLab project.

Create Gitlab Service Account and Cluster Role binding

Type the following commands into the terminal to create the accounts that GitLab will use to administer your cluster:

kubectl create serviceaccount --namespace kube-system gitlab-admin
kubectl create clusterrolebinding gitlab-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:gitlab-admin

The output will look something like this:

int_4

Integrate your Kubernetes Cluster with GitLab

Go back to your GitLab project page. Select Operations then Kubernetes from the left navigation bar. On the next page select Add Kubernetes Cluster

Then select Add existing cluster from the top. You should see a form that looks like this:

int 1

Let’s go field by field:

Name: You can put whatever you want.
Environment Scope: Leave it as default which should be *
API URL:
- This info will come from your config file.
- Go to your command line and type: kubectl config view
- Copy the URL after the server: key.
CA Certificate:
- Go back to your Kubernetes front page and select Kubernetes Dashboard
- In the Kubernetes Dashboard scroll down and select secrets
- Select the GitLab-Admin Token
- At the next screen select the eye next to the ca.cert. It should look something like this: NOTE: This is not the actual value, portions have been redacted.
- Copy the contents of the box and paste it into the CA Certificate box
Service Token
- Do the same thing as step 4 but copy the value from the token section instead of ca.cert
- Paste that value into the Service Token box.
RBAC-Enabled: Leave this box checked.
GitLab Managed Cluster: Leave this box checked.
Project Namespace: Can be left blank.

Select Add Kubernetes Cluster

Kubernetes has been integrated with your GitLab project. A few more steps remain to make sure GitLab can deploy the application to this cluster.

Select Applications

We will need to allow GitLab to install some applications onto our Cluster.

First, install Helm/Tiller. This will allow us to install the other needed applications.

Install Ingress: This will create more virtual machines. This will cost money. However, the ingress is needed to deploy your application. This will expose your application to the internet and route requests to it. Likewise, it routes the application response back to the requesting host, outside of the cluster.

Install cert-manager: This will issue a certificate to the cluster applications. It will allow you to use https when making requests to the application.

When the applications are finished installing it should look like this:

install_apps

Select Details from the top. For the base domain, use the recommended value from the text below the box: base_dom

Your cluster is integrated and Ready for GitLab CICD to push apps.

Use GitLab CI/CD and Auto DevOps to push your application to your Kubernetes Cluster.

We will enable GitLab Auto DevOps and allow GitLab to deploy applications to our cluster.

Select to Settings then CICD from the left menu bar
Expand the Auto DevOps Section.
Check the box that says Default to Auto DevOps Pipeline
Save Changes
This will start a new DevOps Pipeline. It will more than likely fail. Don’t worry about this one. We are going to customize it a little bit.

Configure .gitlab-ci.yml

The .gitlab-ci.yml file gives GitLab directions on how to deploy your application. AutoDev Ops probably works for most, but we want to customize things a little bit. To do this, we will add a new file called .gitlab-ci.yml.

Go back to your GitLab repository by selecting Project Overview from the left menu.

Use the Plus sign to add new file. new_file

Name the file .gitlab-ci.yml

View this file, copy all of the contents into your the new file.

Commit the changes. You should get a screen that tells you the file is valid:

new_file2

We have just one more file to configure to make sure this application will do everything we need.

Configure auto-deploy-values.yaml

Go back to your repository and use the plus sign to create a new directory named .gitlab

Inside that repository create a new file named auto-deploy-values.yaml

Copy all contents of this file into your new file. Save and Commit.

CICD Pipelines

GitLab will Build your application into a docker image, test it using the tests we build in Part 1, then deploy it into production. These actions are depicted visually via a Directed Acyclic Graph (DAG). To see this go to CICD then Pipelines form the left menu. There you will see some failed pipelines (that is okay) and one should be running from our previous commit.

This pipeline will take a while to run. Once it deploys into the cluster, the certificate could take another 30 minutes to an hour to issue. This should only happen the first time you deploy the application. We can work around having to wait for a certificate for the purposes of this walkthrough.

After 15 to 20 minutes, the pipeline should be finished:

pipeline done

What is each stage doing?

Build: Uses your dockerfile to build the docker image and pushes it to your GitLab Container Registry.

Test: Uses the tests and the test script we wrote in part one to perform unit tests.

Production: Pushes your application image to your Kubernetes Cluster.

Performance: tests the performance of your application while it is in production.

You can select any one of these objects in the graph and have a look at what it did.

Select the Production object:

Towards the bottom, you should be able to see the URL of your application:

url prod

Test API

Running this python script will send the payload to be classified by the API. It will send a response back.

import requests 

# set the request URL
url = 'http://seancarey-notebook-to-production.167-172-10-248.nip.io/api'

# text to be classified by the model
payload = {"text": "The world's richest 1% have over twice the wealth of 6.9 billion people. \
        The planet will not be secure or peaceful when so few have so much and so many have so little.."}

# send request
# set verify=False while we are waiting for the certificate to be issued
res = requests.post(url, json=payload, verify=False)

# extract response json
data = res.json()

# print the model response
print(data)

The model predicts that the text represented by the payload variable is 94% Liberal. That isn’t a surprise as it is from the twitter account of Senator Bernie Sanders.

{'percent_in_class': 94.0, 'predicted_class': 'Liberal'}

Test Horizontal Pod Auto-Scaling

Kubernetes runs the flask application in a pod. Each pod contains one instance of the application. Using the Horizontal Pod Auto-Scaling feature of we can have the Cluster automatically add more pods of the app when the load increases to a certain level. We already set that feature up in when we created the auto-deploy-values.yaml file.

deploymentApiVersion: apps/v1


hpa:
  enabled: true
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80




resources:
  limits:
    cpu: 500m
  requests:
    cpu: 200m

This configuration tells the cluster that we want to keep the CPU utilization below 80%. Kubernetes will deploy copies of the application to make sure the average CPU utilization of each pod is below 80%.

Let’s test the HPA configuration applied to this cluster.

First, get the namespace of your application. Kubernetes separates each deployment into separate namespaces as a way to keep different projects and users separated.

Wrap up

Now that we have GitLab deploying the application for us, all we need to worry about is out code. Anytime we push a code change to the Master branch of the repository, GitLab will push a new version of the application to replace the old one. You do not have to worry about building, testing, or deploying your application anymore.

Data Scientists can no longer live exclusively in their Jupyter Notebooks. Increasingly, employers expect a basic level of familiarity with DevOps and CICD practices. Why not add more skills to your toolbox and make your self more marketable. Broadening your technical toolset outside of your comfort zone is just as important as honing your soft skills.

Share on

Twitter Facebook LinkedIn

Sean Carey