MLflow is one of the most stable and lightweight tools for enabling Data Scientists to manage the lifecycle of machine learning models. It is a handy tool with a simple interface for viewing experiments and powerful tools for packaging management, deploying models. It allows you to work with almost any machine learning library.
In the last article, we covered Kube Flow. MLflow is another tool for building MLOps that does not require Kubernetes to work with. I will only briefly mention the main theses.
- MLOps stands for Machine Learning DevOps.
- It helps to standardize the machine learning model development process
- Reduces the time it takes to roll out models into production.
- Includes tasks for tracking models, versioning, and monitoring
All this allows the business to get more value from machine learning models.
So, in the course of this article, we:
- We will deploy services in the Cloud that act as a backend for MLflow.
- Install and configure MLflow Tracking Server.
- Let’s deploy JupyterHub and configure it to work with MLflow.
- We will test manual and automatic logging of parameters and metrics of experiments.
- Let’s try different ways of publishing models.
We must do this as close as possible to the production version. Most of the instructions on the Internet suggest deploying MLflow on a local machine or from a docker image. These options are acceptable for familiarization and quick experimentation but not for production. We will use reliable cloud services.
If you prefer video tutorials, you can watch the webinar “MLflow in the Cloud. A quick and straightforward way to bring ML models into production.”
MLflow: Purpose And Main Components
MLflow is an open-source platform for the lifecycle management of machine learning models. It also solves reproducing experiments, publishing models and includes a central register of models. Unlike Kubeflow, MLflow can run without Kubernetes. But at the same time, MLflow can package models into Docker images to then be deployed to Kubernetes.
MLflow consists of several components.
MLflow Tracking: It is a convenient UI where you can view artifacts: graphs, sample data, datasets. You can also view the metrics and parameters of the models. MLflow Tracking has an API for different programming languages to log metrics, parameters, and artifacts. Python, Java, R, REST are supported.
There are two essential concepts in MLflow Tracking: runs and experiments.
- Run is a single iteration of the experiment. For example, you set the parameters of the model and run it for training. For this single launch, a new entry will appear in MLflow Tracking. When the model parameters change, it will create a recent run.
- The experiment allows you to group multiple Runs into one entity so that you can quickly view them.
You can deploy MLflow Tracking in various scenarios. We will use the variant closest to production – scenario # 4 (as in the official MLflow documentation ).
This option has a host on which JupyterHub is deployed. It communicates with the Tracking Server hosted on a separate virtual machine in the Cloud. To store metadata about experiments, PostgreSQL is used, which we will deploy as a service in the Cloud. All artifacts and models are stored separately in S3 Object Storage.
MLflow Models: This component is responsible for packaging, storing, and publishing models. It represents the concept of flavor. This is a kind of wrapper that allows you to use the model in various tools and frameworks without the need for additional integrations. For example, you can use models from Scikit-learn, Keras, TensorFlow, Spark MLlib, and other frameworks.
MLflow Models also allows you to make models available via the REST API and package them into a Docker image later in Kubernetes.
MLflow Registry: This component is the central repository of models. It includes a UI that allows you to add tags and descriptions for each model. It also allows you to compare different models, for example, to see the differences in parameters.
The MLflow Registry manages the life cycle of a model. In the context of MLflow, there are three lifecycle stages: Staging, Production, and Archived. There is also support for versioning. All this allows you to manage the entire rollout of models conveniently.
MLflow Projects: It’s a way to organize and describe your code. Each project is a directory with a set of files; most often, these are pipelines. Each project is represented by a separate MLProject file in YAML format. It specifies the project name, environment, and entry points and
It allows the experiment to be reproduced in a different environment. There is also a CLI and API for Python.
With MLflow Projects, you can create modules that are reusable steps. These modules can then be embedded in more complex pipelines, allowing them to be standardized.
Also Read: Advantages And Disadvantages Of Modbus