Typically, in order to spot a malware intrusion, antivirus tools plow through the code of executable files. They closely scrutinize thousands of lines in search of pieces of instructions that would be telltale of a malicious intent. For so doing, they must maintain an up-to-date and exhaustive signature library of all the malware out there. Yet, attacks are becoming more sophisticated. Today, bots can automatically churn out scores of malware variants whose signatures are still unknown.
Therefore, static analysis alone can’t cope anymore.
Raising security steps one notch, one must check out the multiple events that occur on the system: files being accessed, data being sent through the network . . . Dynamic analysis platforms collect all those execution traces left on the systems. However, due to the sheer size of the data at hands, these traces remain hardly intelligible. Spotting the needle in the haystack is everything but easy. And that’s where Baguette comes into play.
The experiments are conducted in the High Security Laboratory (LHS) of the Inria Research Center of Rennes University, in Brittany. First, a malware is run on a computer and all execution traces are then retrieved. “Out of this execution report, Baguette will produce a very high level synthesis of all the data contained, says Vincent Raulin who works on this project as a PhD student enrolled in Cidre, a research team focusing on cybersecurity (Inria/University of Rennes/CNRS/CentraleSupélec).
Verbatim
The goal is to display as a graph all the links that might exist between different pieces of information. Because, sometimes, in order to ferret out a malware, it is important to correlate two events. If a piece of software is reading data on the one hand and sending information through the network on the other hand, then the system might be under attack. Yet, these links remain hard to spot, for we are faced with data which is different in nature, on top of being massive and scattered all over the place.
Doctorant, équipe CIDRE
Linking Events in a Graph
Baguette will leverage graph representation to highlight these telltale links. An example? “One can devise a pattern indicative of data ciphering. In the graph, suffice it to write that a vertex of type FILE is linked to a vertex of type DATA under two conditions. First: there must a high data entropy. In other words, a lot of variability in the sequence of characters, meaning it is not a real text but ciphered or compressed data. Second: the file type can’t be identified, so it is not a compressed archive. If one can spot this structure made simply of two vertices and a few conditions in an execution graph, then on can reach the conclusion that a file has been ciphered. Hence a ransomware is probably worming its way.” The initial work exploring this approach was presented in a first publication, in 2022, during RESSI, a French conference on cybersecurity.
Toward Real-Time Dynamic Analysis
Building on these early findings, the scientists are now devising a model of dynamic signatures meant to systematically display the potential links that could exist between multiple events of interest. “In terms of behavior, a lot of things can be monitored, points out LHS Research Engineer Alexandre Sanchez: system calls, network traffic, CPU usage, memory, power consumption... The experiments in our lab enable us to study how the model stacks up against what really happens in the system and to what extent the malware is actually detected.”
The researchers now turn to the next step. “In the graph, data is organized in a perfectly-defined structure, a strict grammar, so to speak. Having these very precise rules, it becomes possible then to built AI models upon the graph, Raulin explains. The patterns being so easy to craft, a Machine Learning model could define them on its own for a certain purpose. For instance, we could ask the AI to find among execution traces the 10 most significant patterns of a particular malware family. In this way, the algorithm could learn the behavioral signatures of such family. In practical terms, that would usher a dynamic antivirus software performing real-time analysis on massive data. Which would make work so much easier ...”
Find out more about the Baguette project with Vincent Raulin (in french)