Sites Inria

English version

Séminaire des équipes de recherche

Mining Input Grammars for Security

© INRIA Sophie Auvin - I comme Internet

  • Date : 3/10/2016
  • Lieu : 2 rue Simone Iff, 75012 Paris - Salle Jacques-Louis Lions 2, bâtiment C - 14h30
  • Intervenant(s) : Andreas Zeller, Saarland University

Knowing which part of a program processes which parts of an input can reveal the structure of the input as well as the structure of the program.  In a URL "", for instance, the protocol “http", the host “", and the path “path" would be handled by different functions and stored in different variables.  Given a set of sample inputs, we use dynamic tainting_ to trace the data flow of each input character, and aggregate those input fragments that would be handled by the same function into lexical and syntactical entities.  The result is a _context-free grammar_ that accurately reflects valid input structure; as it draws on function and variable names, it can be as readable as textbook examples.
In my talk, I show how our AUTOGRAM prototype derives such grammars automatically, and point out their uses in software engineering and security:

  • They facilitate reverse engineering of input formats as well as manually writing valid test inputs;
  • They produce high numbers of varied and valid inputs, thus facilitating automated robustness testing and fuzzing;
  • Integrated into a checking parser, they protect existing programs against invalid, unexpected, and malicious inputs and behaviors.

This work was conducted with Matthias Höschele and Konrad Jamrozik, presented at ASE 2016 ( and ICSE 2016 (  It is part of the ERC SPECMATE project, funded by
an ERC Advanced Grant.

Mots-clés : Gallium Mining Input Grammars Security

Haut de page

Suivez Inria