Helping Artificial Intelligence to Detect Malware

Changed on 22/02/2024
A cybersecurity software currently in design at Inria — the French research institute for digital science—, Baguette introduces an innovative way of making sense of malware execution traces. By building a model of dynamic signatures that captures the relevant data into a graph, this research prototype will make the information more intelligible for experts while also opening the door to automated analysis by an AI model. At stake is thus malware detection at run-time.
ransomware global - baguette - cybersécurité
© Inria / Vincent Raulin


Typically, in order to spot a malware intrusion, antivirus tools plow through the code of executable files. They closely scrutinize thousands of lines in search of pieces of instructions that would be telltale of a malicious intent. For so doing, they must maintain an up-to-date and exhaustive signature library of all the malware out there. Yet, attacks are becoming more sophisticated. Today, bots can automatically churn out scores of malware variants whose signatures are still unknown.

Therefore, static analysis alone can’t cope anymore.

Raising security steps one notch, one must check out the multiple events that occur on the system: files being accessed, data being sent through the network . . . Dynamic analysis platforms collect all those execution traces left on the systems. However, due to the sheer size of the data at hands, these traces remain hardly intelligible. Spotting the needle in the haystack is everything but easy. And that’s where Baguette comes into play.

The experiments are conducted in the High Security Laboratory (LHS) of the Inria Research Center of Rennes University, in Brittany. First, a malware is run on a computer and all execution traces are then retrieved. “Out of this execution report, Baguette will produce a very high level synthesis of all the data contained, says Vincent Raulin who works on this project as a PhD student enrolled in Cidre, a research team focusing on cybersecurity (Inria/University of Rennes/CNRS/CentraleSupélec).


The goal is to display as a graph all the links that might exist between different pieces of information. Because, sometimes, in order to ferret out a malware, it is important to correlate two events. If a piece of software is reading data on the one hand and sending information through the network on the other hand, then the system might be under attack. Yet, these links remain hard to spot, for we are faced with data which is different in nature, on top of being massive and scattered all over the place.


Vincent Raulin


Doctorant, équipe CIDRE

Linking Events in a Graph

Randsomware encryption - cybersécurité - malware - intelligence artificielle
© Inria / Vincent Raulin
Zoom in on a specific ransomware behavior :
encrypting personal files

Baguette will leverage graph representation to highlight these telltale links. An example? “One can devise a pattern indicative of data ciphering. In the graph, suffice it to write that a vertex of type FILE is linked to a vertex of type DATA under two conditions. First: there must a high data entropy. In other words, a lot of variability in the sequence of characters, meaning it is not a real text but ciphered or compressed data. Second: the file type can’t be identified, so it is not a compressed archive. If one can spot this structure made simply of two vertices and a few conditions in an execution graph, then on can reach the conclusion that a file has been ciphered. Hence a ransomware  is probably worming its way.” The initial work exploring this approach was presented in a first publication, in 2022, during RESSI, a French conference on cybersecurity.

Toward Real-Time Dynamic Analysis

Vision globale de signatures d'un virus - malware - cybersécurité - intelligence artificielle

Malware signatures

Although the work of learning dynamic signatures is still in its early stages, here is a preliminary result. These are some of the dynamic signatures that have been extracted from certain malware.

Vision global du virus - malware - cybersécurité - intelligence artificielle

A global view of a virus

Each of the graphs generated by the Baguette project represents the entire ("global") execution of a sample of a certain type of malware. For example, this graph represents everything observed during the execution of a certain virus sample.

Zoom sur des signatures d'un virus - malware - cybersécurité - intelligence artificielle

Examples of signatures

Here are 3 zoomed-in examples of malware signatures.


Building on these early findings, the scientists are now devising a model of dynamic signatures meant to systematically display the potential links that could exist between multiple events of interest. “In terms of behavior, a lot of things can be monitored, points out LHS Research Engineer Alexandre Sanchez: system calls, network traffic, CPU usage, memory, power consumption... The experiments in our lab enable us to study how the model stacks up against what really happens in the system and to what extent the malware is actually detected.”


The researchers now turn to the next step. “In the graph, data is organized in a perfectly-defined structure, a strict grammar, so to speak. Having these very precise rules, it becomes possible then to built AI models upon the graph, Raulin explains. The patterns being so easy to craft, a Machine Learning model could define them on its own for a certain purpose. For instance, we could ask the AI to find among execution traces the 10 most significant patterns of a particular malware family. In this way, the algorithm could learn the behavioral signatures of such family. In practical terms, that would usher a dynamic antivirus software performing real-time analysis on massive data. Which would make work so much easier ...”

visuel audio bloc par defaut
Titre du lecteur

Find out more about the Baguette project with Vincent Raulin (in french)

Fichier audio
Audio file