Digital and environment

Alpinac: a program for detecting new types of air pollution

Changed on 09/03/2023
Aurore Guillevic, a research fellow with the Caramba project team (a joint undertaking involving Inria and Loria), is participating in an Empa-ETH Zurich project (Swiss Federal Laboratory for Materials Science and Technology). The aim of the project is to use new machine learning algorithms to conduct more effective analysis of data on pollution collected using spectrometers. Let’s take a closer look.
Arrière-pays bernois vu depuis la station scientifique de Jungfraujoch
© Myriam Guillevic

New research methods for identifying pollutants

There are three main methods used to evaluate the key public health issue of air pollution. The first and by far the most common involves measuring concentration levels of known pollutants such as ozone, nitrogen dioxide or fine particles in the air: this method is used by regional authorities responsible for monitoring air quality, such as Airparif or Air Breizh. The second method involves searching for listed compounds in the air likely to have been emitted in an industrial setting.

The third method, which has attracted a lot of interest, employs the use of new “blind” research technology and is targeted at pollutants which we are not yet familiar with. “This is really useful in the event of an industrial disaster, like the one which happened at the Lubrizol plant in Rouen in 2019 or at the port of Beirut the following year”, explains Aurore Guillevic, who specialises in cryptography (the science of secret codes). “In the event of a disaster, the best course of action is to quickly draw up a list of the toxic compounds found in the air in order to notify those living in the area and the emergency services.”

Digital analysis of data gathered using mass spectrometry

“As a researcher, one area I have always specialised in is handling and calibrating measurement apparatus, such as mass spectrometers, which are used to detect and identify molecular structures by measuring their mass, explains Myriam Guillevic (Aurore's sister), who worked as a postdoctoral researcher at Empa Laboratory for Air Pollution and Environmental Technology in 2019. “The project I worked on during my postdoctoral placement had a data analysis component, requiring IT expertise which I didn’t have, particularly on machine learning and deep learning. I explained the problem we were confronted with as chemists to Aurore, and she managed to translate it into a computer algorithm.” 

Myriam et Aurore Guillevic dans les locaux de l'Empa
© Martin Vollmer
Myriam and Aurore Guillevic in Empa

The new mass spectrometers installed at high altitude by Empa supply highly detailed data on mass. These are time-of-flight and electron ionisation spectrometers, which involve the use of injected air that is then separated into distinct packets of identical compounds before being ionised (bombarded with electrons) and passed through a flight tube. “The apparatus measures the time taken for each substance to travel from the starting point to the finishing point, giving us an accurate indication of its mass”, adds Myriam Guillevic, who now works as a scientist in Bern (Switzerland) at the Federal Office for the Environment (FOEN). “Using information on its mass, the algorithm then provides the chemical formula for each fragment.” 

Solving chemical problems using combinatorial analysis

After discussing constrains with her sister, Aurore Guillevic turned her attentions to the possibility of using her knowledge of combinatorics to improve the analysis of data from blind research on unknown types of pollution. “Using combinatory analysis, data on atoms and masses can be combined in order to recreate and identify compounds”, explains the researcher, who received support from Inria’s Digital and the Environment Programme, enabling her to work on this multidisciplinary project.

“I was familiar with combinatorial algorithms, which were chiefly used in the 1970s and ‘80s in cryptography, before revealing their limitations. Here we showed that knapsack algorithms [from combinatorial optimisation] could be really useful for specific applications in chemistry. We spent a lot of time discussing which rules from chemistry to apply, thereby eliminating a number of areas of algorithmic research and keeping processing times to a minimum.” 

A graph theory algorithm (abstract models of drawings of networks linking objects) is used in parallel to recreate compounds based on fragments.


We measure traces of gas in the atmosphere, such as chlorinated or fluorinated gases, greenhouse gases which are partly responsible for the hole in the ozone layer. For each component we detect, we begin by searching through the existing literature to see if this is a gas that could affect human health, a greenhouse gas or a gas that could have some other impact on the environment.

When a compound is relevant, we buy the pure substance from a chemical synthesis laboratory, which we then use to prepare a reference mixture. This makes it possible to measure the presence of this compound at a number of sites worldwide.


Stefan Reimann


Researcher at Empa-ETH Zurich's laboratory for Air pollution and Environmental Technology

A program for identifying non-targeted pollutants

The new algorithm was deployed in Python in a software program entitled Alpinac (“Algorithmic Process for Identification of Non-targeted Atmospheric Compounds”.)

The process was also documented in late 2021 in an article written in English in the Journal of Cheminformatics (Springer Nature), co-authored by Aurore Guillevic, Myriam Guillevic, and five researchers from Empa (Martin K. Vollmer, Paul Schlauri, Matthias Hill, Lukas Emmenegger and Stefan Reimann), outlining the checks carried out to demonstrate the suitability of the algorithm. “This is what’s known as method validation”, explains Myriam Guillevic. “We processed a known mixture which we knew contained 50 compounds, which meant we were able to verify that the algorithm had successfully identified all of the compounds contained in the mixture.” 

Alpinac, a public version of which is now administered by Empa’s “climate gases” group, is expected to attract interest from laboratories and the industrial sector, where there is a desire to trace new pollutants in order to add them to the list of toxic substances banned by international protocols (such as the Montreal protocol and the Kigali amendment). The solution could be used by manufacturers of spectrometers, which could incorporate it into the equipment they supply to their clients. Calculations could then be made by the spectrometer’s onboard computer, before being processed and shared with users.