The old saying ‘you get out what you put in’ certainly applies when training an artificial intelligence (AI) algorithm. This is especially true in a business context, where the purpose of the AI may be to interact with customers, manage automated systems or mimic human decision making. It’s critical that the outcomes match the objectives. However, it’s also vital that companies are able to address any incidence of bias that may skew how an AI responds to instructions or requests.
The design and development phase of any new product is crucial because it allows businesses to run tests, and identify and eliminate any flaws. If, for whatever reason, a design flaw is overlooked or if a product develops a fault, then it can be quickly addressed. Faulty devices can be recalled, while updates and patches can be issued to fix any software problems. That’s all well and good for the typical software release, but dealing with an AI algorithm isn’t as straightforward.
Sensyne Health launches SENSE clinical algorithm engine in UK
AI algorithms are highly sophisticated systems designed to perform very specific tasks based on machine learning (ML). Trying to remove any number of biases the AI has developed once it is in operation could be costly and time consuming; it’s also counterintuitive for a technology built to ‘learn’. It’s far more effective to have a process in place that detects and eliminates bias during the design and development stages.
Bias is bad for business
An AI’s basic purpose and functionality are fed into its underlying algorithm. If the AI was to develop an inherent bias, it would have a detrimental effect on the algorithm. This could seriously affect the precision and efficiencies the AI is expected to deliver, which in turn would limit its ability to fulfil its commercial requirements, all of which is bad for business.
Despite the best intentions of developers, bias can always find a way to permeate an AI algorithm. As with any learning process, the student is influenced by its teacher. The scope of an AI’s education is dependent on its curriculum. Not surprisingly, a more varied and diverse curriculum produces a more enlightened student. Equally, a larger and more diverse data set helps to produce more precise and more efficient AI algorithms capable of making smarter decisions.
Training data, testing outputs
Every successful AI algorithm is built on a foundation of training data. However, sourcing the data that meets a business’s requirements can be a massive challenge of logistics and overhead, especially if those requirements include catering for the mass market.
In-house teams of developers, software engineers and quality assurance specialists are typically from the same age range, gender and background. Bias often occurs during the process of data collection and data labelling. Therefore, when building an AI algorithm, it is best not to rely on a single person or small group to provide the data that is going to be used to train the algorithm. To properly train an algorithm, different types of data and inputs are needed.
Are digital collaboration apps feasible for technical teams?
It would be far more productive to use a model that provides the AI algorithm with exposure to people and experiences that are much closer to the customers it will eventually serve. Businesses can use this model to train their algorithms to respond to real-world scenarios, detect where biases occur and reduce their potential impact.
Community built algorithms
The successful sourcing and implementation of training data is dependent on the quantity, quality and diversity of the data itself. The only way a business can source and act on this data is to utilise a diverse pool of participants. Businesses need to be able to select from a community that offers them specific demographics, including gender, race, native language, location, skill set, geography and any other filters that apply.
In fact, vast amounts of data are required to develop an effective algorithm. Most businesses are not equipped to source data at scale; they need the support of dedicated resources to deliver new software and services. A recent project to train a smart voice assistant for media and broadcast services required over 100,000 different voice utterances. These utterances were eventually delivered by 972 different people that were assembled remotely to train the algorithm. An incredible feat, and although speech can be simulated to some degree in a lab, an AI still requires exposure to a diverse range of real voices and accents.
Speech training is only one aspect of the syllabus. Crowdsourced solutions can also help businesses to train AI algorithms to read handwritten documents. Another recent project required thousands of handwriting samples. The quantity was again a key factor as the algorithm required unique samples from the broadest range possible. Over 1,000 participants were remotely assembled to contribute handwritten documents and meet the demand for a diverse set of content.
Use cases for AI while remote working
It’s important to remove any unintended bias that might diminish the accuracy of the end results of the AI. It will never be perfect, but the AI is constantly learning, and the best machine models are ones that are based on large and diverse data sets. The best policy is to source training data from a pool that provides quantity, quality and diversity. Without diversity in the training data, the algorithm won’t be able to recognise a broad range of possibilities, which risks rendering the algorithm ineffective. Remote communities provide businesses with access to this data, and supplement in-house development and testing capabilities. Crowdsourced testing can be utilised to train AI algorithms to study and recognise voices, text, images and biometrics, providing businesses with strong outputs that will service the needs of a diverse customer base.