THOTH Research team
Learning visual models from large-scale data
Thoth is motivated by today's context in which the quantity of digital images and videos available on-line continues to grow at a phenomenal speed: home users put their movies on YouTube and their images on Flickr; journalists and scientists set up web pages to disseminate news and research results; and audiovisual archives from TV broadcasts are opening to the public. Thus, there is a pressing and in fact increasing demand to annotate and index this visual content for home and professional users alike. Current object recognition and scene understanding technology mostly relies on fully supervised classification engines, and visual models are essentially (piecewise) rigid templates learned from hand labeled images. The sheer scale of on-line data and the nature of the embedded annotation call for a departure from this fully supervised scenario. The main objective of the Thoth project-team is to develop a new framework for learning the structure and parameters of visual models by actively exploring large digital image and video sources (off-line archives as well as growing on-line content), and exploiting the weak supervisory signal provided by the accompanying meta-data.
The main objectives of the team are:
(i) designing and learning structured models capable of representing this visual information: Developing novel models for a more complete understanding of scenes to address all the component tasks. We propose to incorporate the structure in image and video data explicitly into the models. In other words, our models aim to satisfy the complex sets of constraints that exist in natural images and videos.
(ii) learning visual models from minimal supervision or unstructured meta-data: The approach we propose to address the limitations of the fully supervised learning paradigm aligns with “Big Data” approaches developed in other areas: we rely on the orders-of-magnitude-larger tra
(iii) large-scale learning and optimization: This part of our research concentrates on the design and theoretical justifications of deep architectures, with a focus on weakly supervised and unsupervised learning, and the development of continuous and discrete optimization techniques that push the state of the art in terms of speed and scalability.
An additional focus of Thoth is on collection of appropriate datasets and design of accompanying evaluation protocols.
International and industrial relations
Thoth team members collaborate with academic research groups at UC Berkeley, University of Edinburgh, MPI Tubingen, University of Washington, Inria WILLOW team, IIIT Hyderabad (India), and also industrial partners such as Xerox Research Centre Europe, Facebook AI Research, Microsoft Research-Inria Joint Centre and Google.
Research teams of the same theme :
- LINKMEDIA - Creating and exploiting explicit links between multimedia fragments
- MAGRIT - Visual Augmentation of Complex Environments
- MORPHEO - Capture and Analysis of Shapes in Motion
- PERCEPTION - Interpretation and Modelling of Images and Videos
- SIROCCO - Analysis representation, compression and communication of visual data
- STARS - Spatio-Temporal Activity Recognition Systems
- WILLOW - Models of visual object recognition and scene understanding