A tool for separating audio sources
Nancy Bertin, CNRS researcher and member of PANAMA research team
Developed at the Inria Rennes – Bretagne Atlantique centre, FASST software is used by audio professionals to separate the various instruments playing a piece of music or to isolate spoken language from accompanying ambient noise. With one reservation: the software remains difficult to use. Inria is thus going to undertake a technology development initiative to make it more user friendly, as researcher Nancy Bertin explains.
The story begins in 2009. Working on new signal treatment algorithms, the researchers on the Panama team were writing software to automatically separate different audio sources found in a recording. It can isolate a clarinet or an oboe in a chamber orchestra, for example. The tool is known as FASST , an acronym for Flexible Audio Source Separation Toolbox. Originally developed in the Matlab language, the prototype was then rewritten in C to serve a larger public so that professionals could start to use it.
In the details, however, a problem remains. “Companies contact us regularly to tell us that they’re very interested in what we’re doing. But your software... We couldn’t get it to work, ” recounts researcher Nancy Bertin . “It’s true that it is not so easy to use. Some settings must be adjusted and some aspects are not clear for non-specialists. This can put off and limit its use for people who already know this type tool well. It’s a pity since it’s a good technology. ”
To improve its user friendliness, Inria is preparing to launch a new technology development initiative to fund an engineering position for two years. “In addition, we plan to hire a second engineer with the team’s own funds. ” The key word for this new research phase: autonomy. “We hope to bring the software to a point where the user can take it off the shelf and find the default settings when he cannot control certain functionalities. ”
Converting instructions provided by the user
The new version should also “know to convert instructions provided by the user to something the program can understand. ” An example? “Certain software configurations require knowing the position of the audio source in relation to the microphones. Our tool does not currently understand the position expressed in meters. This input parameter is calculated and formulated in a more complex manner. The user who wants to indicate that the source is two meters from the mike in a certain direction does not necessarily know how to express it in a parameter understood by the software. The tool will have to be able to take into account the information and translate it. ”
Other data that must be entered: the type of sources. “To separate the instruments in a musical work, it is currently more efficient to provide a model of these instruments. There as well, the user may not know how to calculate and express it. It consists of a large table of figures with frequencies. If the user could tell us, “I have a guitar, a piano and a violin,” then we could provide pre-calculated models for these instruments. ”
Beyond the world of music, the technology is particularly interesting for voice recognition. “We are in the midst of a boom in voice-controlled applications, be it the telephone with Apple’s Siri or the residence with Google Home and Amazon Echo. It works well for the telephone. At home, however, the voice must be analysed despite the sound of the television or children playing. It must be separated before it’s sent to the voice recognition systems that control the household applications: close the shutters and turn on the lights. ” In this area, scientists contributed to VoiceHome , a project funded by the Single Interministerial Fund in which they work with Technicolor, Orange, Delta Dore, Voicebox and eSoftThings . The project also brings together the Inria Nancy centre and Loustic , a laboratory at Rennes 2 University that specialises in the uses of information and acceptability of technologies. For non-business users, the new version of FASST will remain available with a free licence such as Affero GPL. You can also get to know the tool and test it using a demonstrator in the form of a web application provided by A||GO, Inria's software platform .
These articles could interest you:
PANAMA is a project-team Inria common with CNRS.