ISCA Award for Emmanuel Vincent
On 6 September Emmanuel Vincent, Inria research director in the Nancy centre's Multispeech team, received the International Speech Communication Association (ISCA) award for the best paper published in the journalComputer Speech and Languageover the last five years. The winning paper, published in 2013, presents the conclusions of the first 'CHiME' speech recognition challenge organised in 2011 by Emmanuel and his colleagues from the University of Sheffield.
What does your research work focus on within the Multispeech team?
E.V: “I mainly work on speech and ambient noises. One of the tasks we are interested in is processing complex sound scenes where, for example, there are several people talking at the same time in a noisy environment, far from the microphone. We are seeking to 'clean' the signal in order to increase its intelligibility but also to better analyse it automatically, identify the people who are talking and recognise what they are saying. We are also looking to detect and recognise ambient noises.”
What can you tell us about the ISCA award?
E.V: “ISCA is the international learned society that covers all research fields on speech; it brings together computer scientists, signal processors, phoneticians, linguists, etc.
In this case, the recognition is not for my own research but for an evaluation campaign I co-organised with colleagues from the University of Sheffield, in England, and the scientific advances that resulted from it. This campaign had a certain impact on the community and became a series of campaigns - the fifth edition of which has just ended.
In short, this scientific challenge focused on the recognition of voice commands in a noisy domestic environment from a distance of two metres. Speech recognition techniques have improved greatly over the last years however, when we began to reflect on this campaign in 2010, it was a significant challenge for both the scientists and companies such as Google and Amazon to design hands-free technologies - and particularly in a domestic environment where noises are exacerbated.
As the campaigns progressed, major advances were made. In 2015, we provided the participants with state-of-the-art software that made 33% of errors on the words transcribed in a use case. A year and a half later, the error rate had fallen to 2%! This use case is resolved and we have turned our attention to more difficult cases."