Dynalips: A new voice in lip synchronisation

Date :
Changed on 06/02/2020
Nominated to the national jury of the i-Lab contest, organised by Bpifrance and the Ministry of Higher Education, Research and Innovation, Dynalips was created in 2015 within the Multispeech project team, jointly supported by Inria and the Loria. This lipsync technology is soon to be marketed by its own start-up. We will be finding more about it today
Salle d'acquisition des mouvements du visage et restitution sur écran
© Inria / Photo D. Betzinger

My name is Dynalips

I began life in 2011 as an idea inside the mind of my creator, Slim Ouni, a lecturer at the University of Lorraine and member of the Multispeech research team (a joint team supported by both Inria and Loria), but I only really began to take shape in 2015. Since that time, I have continuously perfected my capabilities, enabling more fluid synchronisation between speech and the lip movements of characters in animated films, video games or programmes targeted at the hard of hearing.   What makes me particularly innovative is that I use speech to generate lip movements in 3D. “This process only takes between 30 and 40 seconds”, explains Slim Ouni. “This is very quick, and is a real advantage for studios; when lip-syncing is performed ‘by hand’, as it were, i.e. without the use of software, a whole day can be spent synchronising a sequence lasting only 30 seconds.

My goal is to improve speech fluidity and to better anticipate lip movements, like humans do

As things currently stand, lip animation remains far from perfect in the majority of animated series and films: animators focus on phonemes (the term used in linguistics for the smallest distinctive unit in the speech chain) in order to determine lip movement. The issue here is that, for a number of reasons, this method fails to deliver satisfactory results. “In order for a character’s articulation to be realistic, it’s not enough just to concatenate sounds one after the other”, continues Slim Ouni. “You need to be able to anticipate certain articulatory gestures.” Why? “When we pronounce a word like “cloud”, for example, there are phonemes that we prepare as soon as we start to pronounce the word: we start articulating the phoneme “ou” of cloud, prior to tackling the “ke” or the “le”. This is what is known as coarticulation, and is what Dynalips makes possible: “The technology starts with the audio and anticipates the forming of phonemes.”

My technical specifications

Speech (whether acoustic or audiovisual) is a naturally multidisciplinary field: in order to understand how it is formed, scientists must have as firm a grasp on the research carried out by phoneticians as they do on the work of linguists or psychologists. The researchers working on the Dynalips project also add their own expertise when it comes to speech modelling. The technology developed through this multiplicity of different disciplines combines observation of human articulation with analysis of this data using artificial intelligence, in order to “learn” automatically how to articulate.

My creator and project developer

In charge of the Dynalips project and responsible for setting up the start-up of the same name, Slim Ouni studied IT engineering before completing a PhD in IT, focused specifically on speech processing. He then set off for California, where he occupied a position as a postdoc for three years, before returning to Inria Nancy-Grand Est as a teacher-researcher in 2004. It was then that he joined the Multispeech project team, since which time he has worked on “audiovisual speech”. In the opinion of this budding entrepreneur - who recently completed a training course on entrepreneurship at EM Lyon - speech must not be restricted to a simple acoustic signal, but must also encompass “all of the information conveyed by a person's face while they are speaking”.

My upcoming challenges

As a start-up, the team in charge of Dynalips - which includes a lead researcher, two doctoral students and an engineer - will primarily target companies operating in two sectors identified as high-growth: animation and video games. They will also work on the company’s internationalisation. “From the very outset, the solution was designed to be multilingual”, explains Slim Ouni. “Our models are currently in the process of being adapted for other languages, most notably English.”