Increasing the security of machine learning for military applications

Changed on 18/07/2022

To accelerate research in artificial intelligence, the government recently set up 40 new chairs, four of which are co-financed by the Defence Innovation Agency (AID). Researcher Teddy Furon, a specialist in multimedia content security at the Inria Rennes - Bretagne Atlantique centre, has been awarded one of them. His project, called SAIDA, aims to protect machine learning algorithms for automated image classification.

Deux jets aériens dans le ciel — Image par Robert Waghorn de Pixabay

Ally tank or enemy tank? Automated image recognition is starting to be used by the military. But at war, any error of interpretation is costly. One question in particular concerns the armed forces: could the enemy corrupt a machine learning algorithm and confuse it completely?

Teddy Furon’s Artificial Intelligence (AI) chair aims to study the potential vulnerabilities and explore the best defence mechanisms. “Machine learning consists of feeding the computer with examples. We present it with thousands of images. In each case, we indicate a classification. We tell it that the image represents a cat or a dog, for example. We then test it by showing it images it has never seen before. It will tell us whether it thinks it is a cat or a dog”. For the past ten years, the technique has been working really well.

An attacker in the acquisition chain

“The algorithm can even successfully classify images polluted by certain disruptions, such as sensor noise. But with what level of accuracy? Although the probability of an error is very low, in a military context, we would still like to be able to quantify it. Give it an order of magnitude. One in 100? One in 10,000? Today, science is mainly trying to improve the performance of algorithms. “Far fewer people are working on the uncertainties.”

Moreover, these uncertainties “may not be the result of random chance, but the work of an attacker who slips into the acquisition chain and doesn’t want a certain image to be recognised for what it is. I have an image of a dog, but I want the algorithm to say it is a cat”.

Is it possible? “Yes. Relatively easily. To fool the algorithm, you can add a slight disruption that resembles noise. But not just any kind. Specially made noise for a precise image and a precise algorithm. For this tailor-made attack, “the enemy has every advantage. He can lift the lid. See how the algorithm works. Observe every stage. If he wants the algorithm to get it wrong at the end, he knows that at such and such a stage, he has to reverse this or that, and so on. If he succeeds, the image is no longer classified as a dog... but as a cat!”

Adding randomness

This is where the search for defence begins. “The attacker might know too many details about the classifier. Maybe we should add an element of randomness so that the attacker never knows what algorithm he is up against. The rules of the game are now hidden.” Another question: “what are the causes of vulnerabilities during machine learning? It could be the structure of the algorithm, which is often composed of a neural network comprised of a succession of layers. There may be a snowball effect: a very slight disruption at the entrance is amplified at each layer and eventually changes the decision. Should we therefore modify the structure? Or adjust the learning process? In this case, we would provide images of dogs and cats as well as ones of cats being attacked. It would have to learn that a cat being attacked by something is still a cat. It’s not a dog”.

Another aspect: human perception. “While the algorithm may get it wrong, the human eye can instantly recognise the dog in a disrupted image”. In the same way, it can easily detect the degradation of the attacked image. But attacking techniques are progressing. They are increasingly subtle.

This raises a question: “is it possible to design disruption capable of fooling the classifier while remaining invisible to the naked eye by using a model of human perception?" In this case, visual inspection by a person would not detect degradation that could be the warning sign of an attack.

Teddy Furon's previous work focussed on digital watermarks and steganography, i.e. the art of hiding a signal in an image. “In watermarking, we know which area of a photo must not be affected because the human eye is highly sensitive to it. We would like to incorporate this knowledge into machine learning. In steganography, on the other hand, the defender has to detect micro disruptions that are invisible to the naked eye but are statistically perceptible. In this case also, certain knowledge must be incorporated. The defender could use a sensor to analyse the image, observe that there are suspicious statistical anomalies in a certain place and therefore make the classifier refuse to make a decision”.

Attack by oracle

In a more general approach, the defender could also place the classifier in a black box to prevent an attacker from trying to examine it too closely. But that doesn't solve everything. “Of course, the algorithm is no longer known. Nobody knows how it learned. But you can feed it an image to see how it comes out. You can then see in which classification the picture is placed. Question: “with such little knowledge, without lifting the lid but by being able to test it as much as they want, could an attacker ultimately succeed in fooling the classifier?”

This is called an oracle attack. “To go back to the image of a dog. I start by replacing it with the image of a cat”. The classifier is not fooled, but “if I draw a straight line between these two images, if I mix the two by iteration, there will come a moment when the result begins to look like the image of the dog I want to attack. I arrive exactly at the boundary. On one side of the line, the classifier answers dog and on the other side, cat. I can then say from which point it gets it wrong. I discover what the classifier is locally sensitive to in the image. Why it says dog instead of cat at that particular point”.

Data provided by another country

This is where the problems start. “I can learn this local model to decide what disruptions to add: which pixels to change and with what values”. The downside: “to learn how the classifier behaves around the sensitive point, we have to send lots of requests to the black box. But it is possible”. To retaliate: “we would have to install a mechanism in the black box that detects when an attacker sends images aligned on a single line and tries to perform a dichotomy. The algorithm will then stop responding to protect itself”.

But another scenario is beginning to emerge. “We evoked an algorithm that has been pre-trained and then deployed. Now let us imagine a situation in which the attacker is present during the learning process. Let us take a scenario in which France has the know-how to train classifiers and one of our allies has sonar data on the latest submarine of a foreign power. The ally could entrust us with their images for us to make a classifier. But... how much trust should we place in this data supplied by another country for learning? These images might have been deliberately disrupted in a certain way. Let us imagine that all the images of dogs contain such a disruption, almost imperceptible. The classifier learns. It seems to function normally. It recognises cats and dogs without any problem. But if it is shown an image of a penguin containing this disruption, it replies... dog!”

Industrial collaborations

The chair will address each of these aspects as well as images and sound, radar, sonar and satellite data etc. It will also involve the work of scientists Laurent Amsaleg (LinkMedia team), Mathias Rousset, François Le Gland (SIMSMART team) and Erwan Le Merrer (Wide team). The budget should finance two PhDs, a postdoctoral researcher and an engineer for a period of 24 months each. “In September 2020, we will also have a PhD financed 50-50 by Inria and the Directorate General for Armaments”. On the industrial side, “the Thales group will finance a Cifre PhD, also scheduled for September 2020. We are also in discussion with Airbus Defence & Space and Naval Group for other Cifre PhDs which would start later".

The latest development: “another Cifre PhD, this time financed by Zama, a Paris-based start-up specialising in homomorphic encryption.” This technique allows operations in the field of encryption. It offers two advantages for machine learning. “First of all, we could attempt to use homomorphic encryption for neural networks. This network constitutes know-how that we might want to protect and not share with competitors. In a homomorphic version, all the parameters would be encrypted. Conversely, the images themselves may be confidential. They must not be disclosed. We could therefore homomorphically encrypt this data and then present it to a classifier. A big challenge lies ahead. “Currently, nobody knows how to use machine learning on encrypted data!”

Teddy Furon

Teddy Furon is a member of LinkMedia, a joint project team between Inria, Université Rennes 1, Insa-Rennes and the CNRS, common with Irisa (UMR 6074).

He is also co-founder of Imatag, a company specialised in digital watermarking for images and documents.