When AI met video content: how robots will transform video streaming

The sheer mention of the words Artificial Intelligence (AI) conjures up images of machines gaining self-awareness, quickly surpassing humans in intelligence and, ultimately, turning against their creators. HAL 9000, WORP, Skynet – these are but a few of the AI machines that have recently tried to annihilate the human race, at least on the silver screen.

While many, including billionaire inventor and entrepreneur Elon Musk, fear that such Hollywood blockbusters may one day become our reality, the industry consensus is that AI will most likely not wipe out humanity, at least not in the near future.

My favourite depiction of AI, and one that I think is not too far from today’s reality, comes from the movie Her. Starring Joaquin Phoenix and directed by Spike Jonze, the film is about Theodore, a lonely writer that falls in love with his AI operating system, Samantha.

While Samantha quickly becomes the centre of Theodore’s world, to her he is simply one of many ways she is experiencing and learning from the real world. In fact, Samantha’s fast processing allows her to hold multiple conversations not only with other users like Theodore, but also with other AI operating systems in her network.

Learning in this environment happens at an exponential rate and the AI quickly outgrows the knowledge of Theodore and all other humans. In short, humans outlive their usefulness.

The AIs, however, do not go out of their way to end humanity. They simply don’t care about us, in the same way that we humans rarely take the time to think about the thousands of ants crawling on the ground we walk on.

While the movie suffers from some technical inaccuracies, it is quite likely that the majority of us will have an opportunity to interact with an operating system like Samantha over the next decade. Fuelled by better algorithms and exponentially growing computational ability, we can expect to see significant changes in AI over the next few years.

Just consider how far we have come during the last 10 years, a decade that started with the Palm Treo 750 being considered the top-of-the-line smartphone. Today we have Siri, Cortana, and Alexa understanding our requests, responding to our queries, and providing us with the information we need. A Samantha-like AI from Her that passes the Turing test, i.e. is indistinguishable from speaking to another human, is not far behind.

Fuelling this advancement, particularly around language recognition and computer vision, is an area of AI called deep machine learning, or, simply, deep learning. Deep learning is not new.

It is the evolution of artificial neural networks, inspired by biological neural networks. This idea was first explored during the 1940s, but it wasn’t until recently that we’ve had the ability to truly test this approach. Prior to neural networks, AI approaches were largely based on trying to encode an understanding in machines through a set of rules.

For example, if you wanted to train a computer to identify cats, you would have to provide information about what a cat is supposed to look like: four legs, fur, pointy ears, mid-sized animal, etc. This did not prove to be a efficient path to accurate machine learning.

For example, the AI would not identify a three-legged cat that had undergone an amputation, or it would confuse a cat with a Chihuahua dog. Children, however, easily learn to identify cats. How do they do it? Their parents point to enough cats (as well as dogs and other animals that are identified as not being cats) and eventually the child 'gets it.'

Deep learning is designed to mimic this type of human learning. In the above example, a deep learning algorithm would analyse images of cats using several layers of abstraction, with the outputs of each layer becoming the input to the next. It’s all about trying to teach computers to make connections, similar to those humans make instinctively when growing up, in distinguishing objects.

When it comes to video content, machine learning can help solve one of the growing issues in the industry. Barry Schwarz calls it 'the paradox of choice' which he describes in his book and his excellent TED talk.

Simply put, there has been an explosion of high quality video content production over the last decade. In 2014, Annalect reported that US consumers wanting to watch episodic TV had over 350 to choose from.

Yet, consumers are less happy now than when they had fewer choices. It turns out that too many choices just make decisions harder. So, as an industry, we must come up with new ways of getting a better understanding of what each consumer wants to watch and create tools that will make discovery and recommendation more seamless and effective.

In fact, machine learning could very well be the driver of a completely new set of content discovery and hyper-personalized services that will dramatically improve viewer satisfaction. Some examples of how machine learning and AI can help gain a better understanding of video content include:

– Using natural language processing to analyse the dialogue within multiple TV series, create clusters of topics that fall under specific themes and then rate each TV show (or episode) based on those themes and how they change over time.

– Analysing dialogue to extrapolate the mood, or valence of a scene by leveraging lexicons compiled by psychologists to determine the sentiment of the scene, be it positive or negative, the intensity of that sentiment, and the control that characters have over those emotions.

– Inferring personality traits of the main protagonists – an element that studies have found to be the most important in establishing emotional connection with a movie or TV series – by analysing the vocabulary and sentence structure used by these characters.

– Leveraging deep learning algorithms around image analysis for facial recognition and setting recognition, allowing consumers to find specific scenes and episodes, such as 'show me all the Modern Family episodes where Jay is on a golf course.'

Machine learning is being used to get a scene-by-scene, deeper understanding of video content; what the characters are feeling, what personality traits are being displayed, who is in a scene, where they are and what themes are prevalent.

All this is fed back into improving content understanding and tailoring search, discoverability and recommendations for hungry viewers. Personally, I can’t wait for the day when I can come back from work to say 'Samantha, show me a movie that I’d like to watch,' and she will know exactly what that is, even if I don’t.

Sourced from Nikos Iatropoulos, SVP of Business Development, Media & Telecom, Piksel

Ben Rossi

Ben was Vitesse Media's editorial director, leading content creation and editorial strategy across all Vitesse products, including its market-leading B2B and consumer magazines, websites, research and... More by Ben Rossi

When AI met video content: how robots will transform video streaming

Ben Rossi

Related Topics

Related Stories

What is AI-SPM (AI Security Posture Management)?

How is AI transforming the insurtech sector?

How artificial intelligence is helping to slash fraud at UK banks

Will more AI mean more cyberattacks?

Related Stories

Fully Homomorphic Encryption (FHE) with silicon photonics – the future of secure computing

DFIR and its role in modern cybersecurity

Is RaaS becoming commoditised?

Cutting the cord: Can Air-Gapping protect your data?