Artificial intelligence has a bias problem and it’s completely our fault. Last week, Amazon decided to ditch the secret AI tool it was using to source new recruits when it discovered their system was actively favouring male over female candidates. It transpires their network had been trained to vet applicants by observing patterns in CVs submitted over a 10-year period and – as one would expect given the perpetuating male dominance across the tech industry – the vast majority of those applications came from men.
This caused the AI to perceive male candidates as preferable, going so far as to actively penalise CVs that even mentioned women. Does this infer that the AI was in some way gender biased? Or does it prove that machine learning is still a product of its environment?
On a basic level, deep learning is about analysing past data to predict and understand the future, so if a misogynistic culture is being analysed, the computer will perpetuate and perhaps even amplify that culture. The Amazon fiasco is just the most recent example that proves how these machines (and their masters) still have a lot to learn.
When we consider how visual content – specifically facial recognition – is being used more and more across a variety of applications, the impact that data bias can have becomes glaringly obvious, and the importance of navigating these pitfalls becomes ever more urgent. With current, supervised learning systems we need to make sure our computers are finding the right data, because good quality data will lead to logical and augmentative practices that will elevate and improve a business; bad quality data will lead to practices that tarnish it.
Investing in artificial intelligence: What businesses need to know
Less is not necessarily more
Our world is currently swamped in around 2.5 quintillion bytes of data a day, a decent amount of which is unusable for machine learning because it’s not labelled or structured, making it unusable for supervised learning.
Likewise, using small datasets in machine learning can lead to ‘overfitting‘; in other words, small data will teach a machine to see the trees, but not the wood. Larger and more diverse datasets are what are required if we want to work towards avoiding data bias and utilising deep learning in a legitimate way in the near-future.
For example, if we’re training a network to recognise attractive faces, but only show it white faces, then it will learn that attractive faces can only be white. This is something the viral photo filter app FaceApp learned the hard way last year when it offered an option to make the photo look “hot,” which in practice just made everyone look ‘whiter’.
Supervised vs unsupervised learning
In many ways, a neural network is a lot like an infant child that responds differently to guided and unguided supervision. If you simply let that child loose without any guidance, it will learn, but it will learn what it wants to learn in an unpredictable way. This is unsupervised learning in a nutshell. Supervised learning, meanwhile – a common training technique for AI – is all about giving that infant some direction; teaching it about visual data and how to identify anything from photographs and video to graphics via specific and precise labelling and tagging.
Artificial intelligence will lead to a ‘positive shift in the work people do’
The debate around artificial intelligence and jobs won’t go away. However, the most likely outcome of AI — looking at other industrial revolutions — will be a positive shift in the nature of the work people do. Read here
As this direction is obviously very narrow and focused, it is now commonplace for machines that have been trained with supervised learning to be even better than a human being at image recognition tasks.
Unsupervised learning, meanwhile, lets the machine off the leash a little and allows it to absorb the data in its own way; looking for patterns and drawing connections and conclusions without the requirement of a guiding hand. Unsupervised learning can’t currently match the accuracy and effectiveness of supervised learning, of course, but deep learning is still in its infancy.
Eventually, it will be able to make sense of images without labels. Once we’ve crossed that threshold, the shortage of decent training data will be irrelevant because we will no longer need to rely on it.
Loosening the leash
Despite there being an abundance of data available, much of it is unusable for deep learning purposes. Rather than gathering as much data as possible, perhaps the future of deep learning lies in unsupervised learning. After all, we might be around at the beginning to teach our kids the basics, but most of the really important stuff we learn and experience ourselves.
Successful AI implementation in 3 steps
Success in deep learning currently depends on a steady supply of good, diverse, structured and labelled data that can be used to teach our machines to learn exactly what and how we want them to learn. In order for deep learning to truly evolve, however, the focus should depend less on training data and more on working toward unsupervised learning techniques that don’t just process data but mimic the learning behaviour of human beings. Only then will we truly be able to let our machines do the thinking for us.