Emmanuel Vincent: making maths work for sound
The holder of a DEA (post-graduate diploma) in "Acoustics, Signal processing and Computing applied to Music" and a PhD from Paris VI University completed at IRCAM, Emmanuel Vincent was recruited, as a research scientist at Inria Rennes - Bretagne Atlantique, in 2006 after post-doctoral research at Queen Mary University in London.
Since 1 January 2013 he has been part of Inria Nancy–Grand Est and more specifically the Speech team led by Yves Laprie.
A harp player, Emmanuel wants to use signal processing and mathematics to improve sound quality. One of his central research avenues is separating sources, which consists of extracting the different sources of sound that are simultaneously present in a recording. Although a great deal of progress has been made over the last twenty years, this question is still a research subject with great potential. Beyond much-publicised applications such as remastering and advanced speech recognition for mobile phones, separating sources allows for applications such as 3D sound reproduction and remixing musical recordings or sound tracks, technology that is keenly awaited by the music and film industries.
Sound is often perceived as secondary and does not receive the same attention as video which is at the heart of a battle between equipment suppliers and content providers. Yet imagine trying to watch a film or a televised debate without the sound... For this reason most technical research into signal and audio data processing remains largely academic and there are few small businesses investing in this area. This industry is however set to really take off in the coming years. Indeed, requirements and applications in multimedia, home automation, telephony and healthcare, plus the growth in personal videos, hearing aids and remote voice-activated interfaces, will demand continuous improvements in quality.
Audio signal processing requires hi-fi sound quality: the human ear is capable of recognising the slightest artefact of processing.
The automatic or semi-automatic separation techniques developed by Emmanuel and his colleagues already allow sound engineers to save a lot of time compared with manual separation techniques, and some of them even allow real-time separation of audio streams. Contracts are being negotiated with the Canon research laboratory, with Audionamix (a small firm) and the MAIA sound engineers studio. A new research avenue has just opened up in speech recognition in noisy environments. FASST (Flexible Audio Source Separation Toolbox) software also supplies a set of software modules to enable non-experts to rapidly build an algorithm suited to the characteristics of the recordings to be separated.
Emmanuel also intends to explore the emerging problems relating to handling masses of data, language processing and audio knowledge. These questions, the subject of numerous studies in the fields of natural language and speech, still remain largely unexplored with regard to music, environmental sounds and rare languages. At a scientific level they raise the central issue of modeling and using the uncertainties in the signals, the models and the knowledge at every stage in processing. As regards applications, they could for example make it possible for anybody to automatically compose music to suit their mood, or for companies to improve their communication through sound design (1). This new form of design is really in its infancy, but it is likely to become a major speciality in marketing where any object or structure will be given its own audio identity.
(1) According to the definition of researchers from IRCAM, sound design consists in "making an intention heard ".