Sites Inria

Version française

THOTH Research team

Learning visual models from large-scale data

Team presentation

Thoth is a joint team of Inria and Laboratoire Jean Kuntzmann, and started in January 2016. It is a follow up to the LEAR team (2003-2015).

Thoth is motivated by today's context in which the quantity of digital images and videos available on-line continues to grow at a phenomenal speed: home users put their movies on YouTube and their images on Flickr; journalists and scientists set up web pages to disseminate news and research results; and audiovisual archives from TV broadcasts are opening to the public. Thus, there is a pressing and in fact increasing demand to annotate and index this visual content for home and professional users alike. Current object recognition and scene understanding technology mostly relies on fully supervised classification engines, and visual models are essentially (piecewise) rigid templates learned from hand labeled images. The sheer scale of on-line data and the nature of the embedded annotation call for a departure from this fully supervised scenario. The main objective of the Thoth project-team is to develop a new framework for learning the structure and parameters of visual models by actively exploring large digital image and video sources (off-line archives as well as growing on-line content), and exploiting the weak supervisory signal provided by the accompanying meta-data.

Research themes

The main objectives of the team are:

(i) designing and learning structured models capable of representing this visual information: Developing novel models for a more complete understanding of scenes to address all the component tasks. We propose to incorporate the structure in image and video data explicitly into the models. In other words, our models aim to satisfy the complex sets of constraints that exist in natural images and videos.

(ii) learning visual models from minimal supervision or unstructured meta-data: The approach we propose to address the limitations of the fully supervised learning paradigm aligns with “Big Data” approaches developed in other areas: we rely on the orders-of-magnitude-larger training sets that have recently become available with metadata to compensate for less explicit forms of supervision.

(iii) large-scale learning and optimization: This part of our research concentrates on the design and theoretical justifications of deep architectures, with a focus on weakly supervised and unsupervised learning, and the development of continuous and discrete optimization techniques that push the state of the art in terms of speed and scalability.

An additional focus of Thoth is on collection of appropriate datasets and design of accompanying evaluation protocols.

International and industrial relations

Thoth team members collaborate with academic research groups at UC Berkeley, University of Edinburgh, MPI Tubingen, University of Washington, Inria WILLOW team, IIIT Hyderabad (India), and also industrial partners such as Xerox Research Centre Europe, Facebook AI Research, Microsoft Research-Inria Joint Centre and Google.