SafeUM
Home Blog Services Download Help About Recharge

Axarhöfði 14, 110 Reykjavik, Iceland

Iceland - 2015
SafeUM
Blog
Services
Download
Help
About
Recharge
Menu
Archive
TOP Security!
23 Nov 2016

Google’s DeepMind AI can lip-read TV shows better than a pro

Artificial intelligence is getting its teeth into lip reading. A project by Google’s DeepMind and the University of Oxford applied deep learning to a huge data set of BBC programmes to create a lip-reading system that leaves professionals in the dust.

The AI system was trained using some 5000 hours from six different TV programmes, including Newsnight, BBC Breakfast and Question Time. In total, the videos contained 118,000 sentences.

First the University of Oxford and DeepMind researchers trained the AI on shows that aired between January 2010 and December 2015. Then they tested its performance on programmes broadcast between March and September 2016. By only looking at each speaker’s lips, the system accurately deciphered entire phrases, with examples including “We know there will be hundreds of journalists here as well” and “According to the latest figures from the Office of National Statistics”.

Here is a clip from the database without subtitles:

And here’s the same clip with subtitles provided by the AI system:

AI shows the way

The AI vastly outperformed a professional lip-reader who attempted to decipher 200 randomly selected clips from the data set.

The professional annotated just 12.4 per cent of words without any error. But the AI annotated 46.8 per cent of all words in the March to September data set without any error. And many of its mistakes were small slips, like missing an ‘s’ at the end of a word. With these results, the system also outperforms all other automatic lip-reading systems.

“It’s a big step for developing fully automatic lip-reading systems,” says Ziheng Zhou at the University of Oulu in Finland. “Without that huge data set, it’s very difficult for us to verify new technologies like deep learning.”

Two weeks ago, a similar deep learning system called LipNet – also developed at the University of Oxford – outperformed humans on a lip-reading data set known as GRID. But where GRID only contains a vocabulary of 51 unique words, the BBC data set contains nearly 17,500 unique words, making it a much bigger challenge.

In addition, the grammar in the BBC data set comes from a wide diversity of real human speech, whereas the grammar in GRID’s 33,000 sentences follows the same pattern and so is far easier to predict.

The DeepMind and Oxford group says it will release its BBC data set as a training resource. Yannis Assael, who is working on LipNet, says he is looking forward to using it.

Lining up the lips

To make the BBC data set suitable for automatic lip reading in the study, video clips had to be prepared using machine learning. The problem was that the audio and video streams were sometimes out of sync by almost a second, which would have made it impossible for the AI to learn associations between the words said and the way the speaker moved their lips.

But by assuming that most of the video was correctly synced to its audio, a computer system was taught the correct links between sounds and mouth shapes. Using this information, the system figured out how much the feeds were out of sync when they didn’t match up, and realigned them. It then automatically processed all 5000 hours of the video and audio ready for the lip-reading challenge – a task that would have been onerous by hand.

The question now is how to use AI’s new lip-reading capabilities. We probably don’t need to fear computer systems eavesdropping on our conversations by reading our lips because long-range microphones are better for spying in most situations.

Instead, Zhou thinks lip-reading AIs are most likely to be used in consumer devices to help them figure out what we are trying to say. “We believe that machine lip readers have enormous practical potential, with applications in improved hearing aids, silent dictation in public spaces (Siri will never have to hear your voice again) and speech recognition in noisy environments,” says Assael.

Tags:
Google trends artificial intelligence
Source:
New Scientist
1846
Other NEWS
3 Jul 2020 safeum news imgage An encrypted messaging service has been infiltrated by police
4 May 2020 safeum news imgage Two-Factor Authentication ​What Is It and Why You Should Use It
12 Dec 2019 safeum news imgage Encryption is under threat - this is how it affects you
4 Nov 2019 safeum news imgage Should Big Decisions Be Based on Data or Your Intuition?
7 Jun 2018 safeum news imgage VPNFilter malware infecting 500,000 devices is worse than we thought
4 Jun 2018 safeum news imgage Hackers target Booking.com in criminal bid to steal hundreds of thousands from customers
1 Jun 2018 safeum news imgage Operator of World's Top Internet Hub Sues German Spy Agency
30 May 2018 safeum news imgage US says North Korea behind malware attacks
29 May 2018 safeum news imgage Facebook and Google targeted as first GDPR complaints filed
25 May 2018 safeum news imgage A new reason to not buy these cheap Android devices
24 May 2018 safeum news imgage Flaws in smart pet devices, apps could come back to bite owners
23 May 2018 safeum news imgage Google sued for 'clandestine tracking' of 4.4m UK iPhone users' browsing data
21 May 2018 safeum news imgage LocationSmart reportedly leaked phone location data onto the web
18 May 2018 safeum news imgage The SEC created its own scammy ICO to teach investors a lesson
17 May 2018 safeum news imgage Thieves suck millions out of Mexican banks in transfer heist
All news
SafeUM
Confidential Terms of Use Our technologies Company
Follow us
Download
SafeUM © Safe Universal Messenger

Axarhöfði 14,
110 Reykjavik, Iceland

Iceland - 2015