Sites Inria

English version

Séminaire des équipes de recherche

Séminaire Prosecco : Embedding and compiling languages in Coq

© INRIA Sophie Auvin - P comme Protocole

The challenges of verified and secure compilation often involve translating between different languages while preserving semantic properties of the models.

  • Date : 7/11/2019
  • Lieu : Salle Jacques Louis Lions 2
  • Intervenant(s) : Paolo Torrini (Lab. Verimag - INP Grenoble)

A language can be embedded in Coq by defining the meaning of its constructs in the metalanguage (a shallow embedding), or more syntactically, by defining its AST as an abstract datatype and formalising explicitly its semantics (a deep embedding). The two approaches (often used in combination) have generally different qualities: a shallow embedding makes it easier to reason about denotational properties and extend the language; a deep embedding facilitates language manipulation - e.g. syntactic translations, small-step semantics, code optimisations. It is often useful to relate the two approaches by associating a deep embedding and a shallow one, by reflection (deep -> shallow) and reification (shallow -> deep).

From a theoretical point of view, the difference between shallow and deep embedding is sometimes compared to that between interpretation and compilation. In fact, systems such as Bedrock rely on a shallow embedding approach to represent high-level languages as macro-extensions of idealised assembly. On the other hand, the CompCert C compiler is based on chains of verified translation steps between deep embeddings, from the C frontend to the supported backends.

A natural question may arise: how scalable it is to relate deep and shallow embeddings by reflection and reification? Reflection needs to be provably sound, i.e. to preserve the semantics, and the correctness of reification results need to be verified. Both these tasks can often be accomplished semi-automatically when the reified syntax is very close to the corresponding shallow terms. But this may not be the case when we consider separate developments.

Proving semantic equivalence

We worked specifically on reflection soundness at the University of Lille in the context of the EU ODSI project, where the higher-level executable specification of a small kernel was formally developed as a shallow embedding in Coq, based on a first order monadic language (MC), on top of an abstract hardware model. In order to reify MC and support its translation to CompCert C, we developed a deep embedding (DEC) which allows for the import of generic specifications. Due to the presence of generic effects and mutual recursion in the language, both the operational interpreter of DEC and its reflection into MC result in large, complex terms, inductively defined in proof mode, which nonetheless we could prove to be extensionally equal by means of a seemingly scalable technique. The heart of this technique consists of packing the interpreter definitions together with the soundness statement, using subset types.

Using encryption to protect control flow

A general problem in security is to prevent tampering with control flow in program execution. In the NanoTrust project (a collaboration of Verimag and CEA) we address this problem relying on encryption of the binary code. At Verimag we developed an extension of the CompCert RISC-V backend with a symbolic, shallow model of encryption-decryption and an instrumentation of the deeply embedded backend, to target a secure processor co-developed at CEA, and we proved semantic correctness with respect to symbolic executions that take encryption into account. We think this approach could provide a reasonably strong protection for the forward edges in the control flow graph (less so for the backward edges, as long as the stack is not protected).

Mots-clés : Coq Prosecco Science Inria de Paris Recherche Equipe de recherche Séminaire Langages Languages

Haut de page

Suivez Inria