Machine learning is a race. Those companies that can put machine learning models into production, on a large scale, first, will gain a huge advantage over their competitors and billions in potential revenue. But, there is a huge issue with the usability of machine learning — there is a significant challenge around putting machine learning models into production at scale.
Organisations can create incredibly complex machine learning models, but it’s problematic to take huge datasets, apply them to different iterations of ML models and then deploy those successful iterations into production.
Machine learning versus AI, and putting data science models into production
Machine learning is becoming the phrase that data scientists hide from CVs, putting a data science model into production is the biggest data challenge, and companies are still not getting it. We spoke to a data expert on the state of data science, and why machine learning is a more appropriate phrase than AI. Read here
The machine learning landscape: where are we?
When it comes to working out complex correlations — software development — “no one in the world is talented enough to figure these out, there’s too much noise in the data,” explained Eero Laaksonen, CEO at Valohai — the machine learning platform.
This is where ML and deep learning comes into play; it acts as the bridge between the start point and endpoint. ML, now, builds the function and the outcome is the model. “This is different from software development because the developers just write the function,” continued Laaksonen. “With ML, it’s combining the code with the data to define the model.”
Today, the working methods with ML are very similar to what was happening in the 1990s with software development — developers are under increased pressure to deploy successful ML algorithms into production faster.
“ML models are still an art form” — Eero Laaksonen
Why is it difficult to put ML models into production?
There are a number of reasons why it’s difficult to put machine learning models into production
1. Experiment reproducibility: the same combination of code and data (what the ML is trained on) can’t be reproduced easily.
2. Regulatory compliance: “ML can’t work in the wild west,” said Laaksonen. “Organisations and regulators need to figure out laws around automated decision-making, something that is more reliable from the human perspective.
“Europe has been proactive in this with GDPR, and it is a strike in the right direction, but it’s difficult to bake regulation into ML production. Banks have to be able to explain automated decision they made six months ago — it’s the output of data and models.
“If the ML model is running in production, currently there is no way to determine what caused it to come to that decision, and this needs to change. You need traceability and, therefore, version control in ML is very important.”
3. Fast on-boarding of teams: Organisations want to grow their developer and ML teams, but have disparate datasets and lots of code. It’s difficult for people to come on board and see projects, whose working on them and identify where the data. “And then there’s the issue of hiring data scientists who are a hot commodity,” continued Laaksonen. “You need the ability to track what they’re doing if they leave, their projects and pipelines etcetera.”
4. Quick experiments: Speed is key, but with ML, it is very much a trial and error approach. The only way to try more things with ML is to put more hardware on it, which is quite challenging — every change must be tested with lots of data.
Putting machine learning models into production faster
Valohai, which means Lantern Shark in Finnish, solves these problems as a machine learning platform-as-a-service — it illuminates deep learning and machine learning.
The platform saves the datasets that have been used to run different ML models and shows who is accountable, the experimentation cost and what type of data is being used, among other metrics.
Effectively, the platform fast tracks the trial and error model necessary in machine learning by forking the data pipeline into new models.
“When running on the cloud developers can’t monitor how much processing power you’re using. With our platform you can save the outputs and see the results on your data storage. Organisations can test different ML models on saved datasets from different sources and can run all the executions automatically with autoscaling on different cloud hosting providers,” explained Laaksonen.
The aim is to move from raw data to production faster, and the platform gives organisations the ability to rerun the data pipeline, retrain the model and deploy it.
“We’re at the phase where businesses will deploy ML on their own, before realising they need specialists” — Laaksonen
There are a number of use cases, and Laaksonen referred to one of Valohai’s customers, TwoHat Security, which is building a model to stop distribution of child pornography. TwoHat Security is working with Canada’s law enforcement and universities to build a machine vision model to detect sexual abuse material from darknets and other hard to reach places of the internet on top of Valohai’s platform.
There are also other applications for predictive maintenance, predicting risks in financial services and with telecommunications for forecasting the location of future towers.