From Notebook To Production Project Introduction

2 minute read

If you are like most of us who are starting in the field of Data Science, then you do most of your work in Jupyter Notebooks. Maybe you have finished your first project where you have collected data, cleaned it, and used it to train a model. The model has pretty good results, and you are proud of your work.

Now, what…?

How useful is that model while it is sitting in your notebook on your personal computer?

Hint…It is not.

Your model would be much more useful if it were being served as an API somewhere.

A typical pattern is for a Data Science team is to give their trained model to a Software Engineering team. The Software Engineering team would take the model, place it in software and, serve it as an API.

What if the Data Science Team could do this without the help of the Software Engineering team?

What if they had a method that automated taking this model and serving it as an API?

Why is this important to a Data Scientist?

Data Scientists can no longer live exclusively in their Jupyter Notebooks. Increasingly, employers expect a basic level of familiarity with DevOps and CICD practices.

What is DevOps: A set of cultural practices for automating processes between teams in a Software Development and IT environment. The core concept of DevOps is a shift away from highly specialized silos of workers towards a culture of teams that are highly collaborative and cross-functional.

What is CICD: CICD stands for Continuous Integration / Continuous Delivery. It is a method for quickly integrating code changes into a project and rapidly delivering those changes to working software in an automated fashion. Automate what can be automated, let the computers do the repetitive stuff. The idea is to have developers worry about delivering working code and nothing else.

Project Introduction

In this blog series, I take a model I previously trained and host it in Kubernetes as an API in a Flask web application. GitLab is a unified CICD platform that automates this entire process in a scalable and repeatable manner. In the first two parts, I create a containerized Flask Web Application that serves my model as an API. In part III, I present how to use GitLab and a Digital Ocean Cloud hosted Kubernetes cluster to automate and scale this process.

  • Part I: Build and Test a Flask API
  • Part II: Containerize a Flask API using Docker
  • Part III: Automate the Deployment of a Flask API to Kubernetes using GitLab AutoDevOps

Updated: