One of the main reasons is a gap between the processes that data scientists and developers are responsible for.
The former is focused on iteration and experimentation during the last dream of a fail-safe system at any cost.
But this is wider than all machine learning problems in industrial development. Therefore, there was a need to find an approach that would help establish communication between teams, increase the efficiency of ML projects, and minimize the complexity of their implementation. This approach is called MLOps, which can be seen as an extension of DevOps.
What Is MLOps
The popularity of machine learning is growing, and with it, the number of difficulties for technology teams and companies. For example, moving to the cloud, scaling, creating and managing machine learning pipelines, and many others.
In 2015 the Google team published a study on hidden technical debt in machine learning systems. According to the article, model development is only a small part of the process. For example, infrastructure maintenance, data collection and verification, configuration, and monitoring take up most of the time.
To optimize this large system, a new engineering culture of machine learning began to form. So, at the junction of DevOps, Data Engineering, and Machine Learning, MLOps appeared.
MLOps is a set of collaborative practices that helps build a transparent workflow on ML models. Their use improves quality, simplifies the management process, and automates the deployment of the machine and deep learning models in a production environment. Everyone is involved in this process, from managers with minimal programming knowledge to data scientists, DevOps, and machine learning engineers. As a result, the resulting models are easier to align with business and regulatory requirements.
Suppose a bank has started developing an ML model that will automatically assess a client’s creditworthiness. As a rule, problems begin at the start of work with the model, and then they only become more. Let’s see how MLOps helps to solve the main ones.
Establishing joint work of different specialists and teams associated with data is difficult. They have different skills and tools for work. In the first stage, the data scientist develops the model; he also conducts experiments and trains it. After that, the model needs to be packed and provided with the necessary environment, set up monitoring, and continue to train on new data.
After the scoring model is developed, it must be connected to the banking system. But the developer of the last stage may need to understand more about machine learning, which can create an unforeseen risk for the model to work. MLOps helps to establish interaction between teams with different profiles so that work at each stage is transparent and consistent with the project’s objectives.
Models are data-sensitive. The development of ML models is very different from the development of friendly services: for example, any changes in the incoming data stream will affect the quality of the model. If you feed a scoring model data on a thousand people over 30 years old and data on just a couple of hundred people in their 20s, the model will be biased. Therefore, it is important to constantly retrain it as data accumulates, to monitor the quality of the training sample.
Teams spend time on the same experiments. If specialists are not synchronized with each other, they can conduct the same experiments independently and waste resources irrationally. A single catalog of models allows you to create a library of models with a description of their work. This is useful for tracking quality metrics and comparing experimental results.
What other problems does MLOps solve:
- Reflecting changes in business goals. Many dependencies exist on ever-changing data, maintaining model performance standards, and providing AI control. Without MLOps, it isn’t easy to quickly update the model’s training following new business goals.
- Communication gaps. It is difficult for technical and business teams to find a common language. And this is one of the personal reasons for the failure of large projects.
- Risk assessment. There is a lot of controversy surrounding ML / DL – how objective they are and the right to make some decisions. If the neural network recommends the wrong video to the user, then the price of such an error will be low. But the cost of making a mistake increases when an account is suspended, or a mortgage application is rejected.