Karteek Alahari, permanent researcher within the THOTH project team
Karteek Alahari specializes in area of computer vision. Since October 2015, he has been a permanent researcher within the THOTH team (the new team formed from LEAR) at Inria Grenoble – Rhône-Alpes. He joined the laboratory two years earlier as a young researcher.
What did you study at university before you joined Inria?
karteek Alahari: I studied computer science at university in Hyderabad, my home town in India. Hyderabad is an Indian version of the Silicon Valley, nicknamed “Cyber City”. After obtaining my master’s degree, I chose to specialize in computer vision. The objective of this discipline is to enable computers to understand and analyze the content of photos and videos. In practical terms, the goal is to create algorithms which can reproduce, in real time, what the human brain does naturally. I was lucky enough to work on my thesis under the guidance of an expert in the field, Philip Torr from Oxford University. Between 2010 and 2013, I was a post-doc at Inria Paris, in the WILLOW project-team, led by Jean Ponce, which specializes in the visual recognition of objects and scenes. In 2013, I joined the THOTH* project team led by Cordelia Schmid, who had just received a European Research Council (ERC) grant, before being appointed a permanent researcher last October.
What are the subjects of your work?
Pose estimation - THOTH
Karteek Alahari: By 2018 or 2020, it is estimated that 80% of Internet traffic will be due to image or video content. During the last 12 months alone, nearly 380 billion images have been taken! In order for this mass of images and videos to be usable, we must develop systems to interpret and analyze them automatically -- a challenging large-scale problem. These can be used to break down an image, pixel by pixel, in order to analyze it. Each pixel is then linked with a label, a sort of keyword, indicating to which object or person in the image it belongs to. As an image can contain several millions of pixels, this requires a colossal number of computations. Combinatorial optimization enables us to perform these operations very efficiently. Using this optimization method, we can, for example, locate a person’s joints in a video frame and follow his or her movements in the entire video sequence. This method can then be used to search for keywords in image and video collections.
How do you see the future of your discipline?
Segmentation - THOTH
Karteek Alahari: Machine learning is one of the essential areas in which we must make progress. Currently, learning is said to be "supervised". For example, in order to teach an algorithm what is a cat, you must show it many images of different cats, from every viewpoint. The larger the number of images, the greater the recognition system’s stability and reliability. This “learning” process is very time-consuming, for example, the time required to label each pixel in an image. We must optimize this step if our discipline is to move forward. This means we must develop algorithms that enable computers to work more independently ("unsupervised" algorithms), enabling them to learn more efficiently. A few teams worldwide are working on this, and I would like to play a part in reaching this goal.