Alexandre Papin - 23/08/2012

Matthias Gallé wins the accessit thesis prize

Matthias Gallé, former PhD student within the Symbiose team at Inria, won the accessit thesis prize for his works in the field of bioinformatics.

How did you get to the field of bioinformatics ?

 The aspect of computer science which probably attracts me the most is the opportunity of doing interdisciplinary research. 
More and more areas are becoming flooded by an increasing amount of data, and informatics is becoming crucial to get information out of this data (let's not even speak of doing this efficiently).
This is particularly true for biology, and more specifically, molecular genetics. In addition there is the baffling fact that the fundamental information which is transmitted takes the form of a sequential string. The possibility of measuring the information content of these sequences is very appealing to me (as it was at the beginning of my thesis). Although since Turing biology had a strong attractions to computer scientists I think that at the current stage we are only scratching the surface of what can be achieved using computational methods.

You just got the accessit thesis prize for your work within the Symbiose team at Inria. Can you tell us about your PhD?

 As I mentioned, I was interested in studying other sciences through an informatics lens. During my master (in theoretical computer science, done at the FaMAF (University of Córdoba, Argentina)) I spent a semester at the U. of Campinas (Brazil) where I took a course on bioinformatics. This fueled my interest, which was further increased by an Internship programme from Inria, which I did at Symbiose with François Coste (who then became my main PhD advisor). At that stage I knew I wanted to pursue a PhD and the combination bioinformatics+France+good relationship with François settled my decision. I was funded by an Inria CORDI grant.
The general idea of the thesis was to model genetic sequences with context-free grammars, inspired by what had been done in natural language processing. After the start of the PhD, we obtained funding for a collaboration Inria/CNRS - MINCyT (the argentinean science agency). Thanks to this we could do some mutual visits and exchange ideas with Gabriel-Infante López (who would become my co-advisor) and his teams who works, besides other things, developing parsing methods for natural-language texts.
My final dissertation was on a combinatorial problem called the Smallest Grammar Problem, which is the problem of finding a smallest context-free grammar that generates exactly one sequence. We applied this to compress DNA sequences, to approximate Kolmogorov Complexity (an incomputable measure of randomness of a given sequence) and to discover structures in DNA.

Following this award how do you see the rest of your career?

 Having lived in several different countries I had several cultural shocks during my life, but none surprised me so much as the one that waited for me when I joined the Xerox Research Centre Europe. From applying computational theoretical ideas to bioinformatics, I started to use statistical methods on natural language and other kind of data (Xerox is involved in huge businesses likes transportation, health care and customer care to name just a few) with a more application oriented goal in mind. The differences in language, research communities and motivations are subtle but nevertheless important. 
At the same time, there is much that can be transferred from one application domain (like molecular genetics) to another (like natural language processing). The same applies inside a domain (like machine learning), where very different approaches exist to solve the same task. I am enjoying finding research opportunities in this process and to adapt ideas, data structures and algorithms from one side to the other. 
As I said in the beginning, I love the interdisciplinary doors that computer science opens. But to be able to explore them one has to be willing to stand on the threshold and to listen to both sides.

