Television - Speech recognition
Smartly Segmenting TV For VOD
It looks like a well-polished on-the-shelf HTML5 application. “But keep in mind that TexMix is actually a scientific demonstrator! warns research engineer Sébastien Campion. Our main goal was to gather and display the various findings of our research team. So we essentially packed into a single piece of software all these algorithms that TexMex scientists came up with in recent years. A lot of this research was financed through the Quæro European project.”
Transcript-based Video Delinearization
As a preliminary step, a television newscast was recorded over a period of one month. “Then, in a fully automated fashion, TexMix was able to extract information from this corpus of files and lay out an interface allowing the viewer to browse the content.”
The different news topics are displayed through clickable thumbnails on a timeline. As soon as a video is been played, a subtible starts ticking. “This unpunctuated string of words illustrates our first research axis : speech-to-text. The newsfeed thematic segmentation is not image-based but soundtrack-based. We detect the lexical breaks in the wordflow. We spot when the speaker switches from say politics to sports. This is the innovative approach that our sequence splitting hinges upon. Researcher Guillaume Gravier has contributed a lot to this field.”
The strategy comes handy for a variety of further purposes as it brings huge content-based navigation capabilities.
“Just roll over a thumbnail and relevant keywords will pop up: plane crash, Indonesia, fog... The viewer knows instantly what the news report is about. These keywords are then fed to search engines such as Google, Bing or Yahoo!. We retrieve say the 100 first pages of results that we then re-process for ranking refinement. We end up with a selection of highly relevant links that point either to the very same story on the web or to additional information. By so doing, the user can access further information on the topic if need be.” With the content now duly identified, Texmix can also retrieve other videos on the same topic and thus offer a hypervideo navigation mode. “The relevant news reports are instantly displayed as clickable thumbnails on the timeline. By sliding the cursor, one can also limit or extend the selection to a desired time span. A week instead of a month for instance.”
The second research theme deals with named-entity recognition. “Grappling with proper names such as patronyms or toponyms can prove very tricky.”
Indianapolis might end up being misinterpreted for Indian police. “Hence the need for more robust methods such as the ones proposed by Christian Raymond and Julien Fayolle. Once these entities are correctly identified, we have a grasp of the whos and the wheres. This knowledge opens the door to real-time geolocalization. Using a Google Map, TexMix can now pinpoint all the locations mentioned in the newscast”
thus offering another modality for browsing content.
On top of all this, comes an image comparison function. “Take the example of a graph in the news showing poll ratings for a coming election. You might want to check other polls done throughout the whole campaign. So you are looking for similar graphs, similar images. TexMix offers that capability through a simple button.” Again, clickable thumbnails pop up in a jiffy. “The retrieval here took a mere 7 milliseconds for a base of 1.5 million frames. But we have another demo running about as fast with 10 million images. Such swift retrieval within huge bases is the hallmark of recent algorithms developed by researcher Hervé Jégou. It's the third research axis highlighted in the application.”
This ability to quickly plow through large-scale video bases is a sine qua non for the automated exploitation of television archives accumulated over decades by institutions such as the French National Audiovisual Institute. “Documentalists there are growing fond of our tool.” But the technology has also spurred the interest of a major French network. “They would like to see how TexMix could be of use for enhancing the user experience of their programs. A bilateral collaboration is being discussed for further work in this field.”