Facebook is planning a future where videos are as searchable as everything else on the Web.
Like companies such as Google and Baidu, Facebook is focusing a lot of its resources on artificial intelligence, or AI, to bolster its search capability.
“We’ve built an AI backbone that powers every aspect of the Facebook experience. It’s powered by a massive [computer processor] cluster,” Joaquin Quinonero Candela, Director of Applied Machine Learning at Facebook, said when speaking on Wednesday at Facebook’s F8 conference in San Francisco.
Facebook is applying this computer power to make video searches as easy as searching images. Reverse image searches on major search engines allow you to, for example, find people despite having only an image of a person and no other information.
“We’re applying face recognition to videos,” Candela said during the keynote presentation. He gave an example of a video showing two Facebook employees. As the video plays, their faces are automatically recognized and tagged. At the end of the video segment, two more people walk by in the background and their faces are also recognized and tagged.
“What you have is a video that has been tagged with the four actors in it,” Candela said. “Importantly, the specific frames – the time stamps where they appear – are tagged too. Imagine the power this could give you to search through tens of millions of videos and find specifically the ones that contain the people you’re interested in seeing. And jump straight to the place where they appear.”
Candela also said that Facebook is working on a speech recognition system for mobile phones that generates closed captions so you can understand what’s being said in the video without having to turn up the volume.
Image search will get smarter too
Candela also addressed another major focus of AI: an image classifier that helps the visually impaired but could also result in an “image search on steroids.”
He brought up an example of wanting to find a photo taken years ago but you don’t know where it is. “It’s somewhere in some photo [collection] a friend of mine took but in a few years I will have forgotten who took it,” he said.
“We’re doing research on cracking the image open and understanding it at the individual pixel level. This is called image segmentation. We’re building the ability to understand the individual objects that compose an image.”
This will help the visually impaired to have “more immersive experiences” where the images “talk” when you swipe your finger over them and describe what’s in the image.
One image he showed had a group of people standing on a ski slope. “The other thing we could do is, you could imagine us building image search on steroids where I can say, ‘Don’t just get me pictures of snow, get me that photo where the five of us are on skis on the snow and there’s a lake in the background and trees,’” he said.