How much human intervention is involved in an algorithm?
The first thing a human being does when setting up algorithms is to connect them to the information system of his company or administration, and in particular to input data. The choice of this data - which feeds the algorithms - comes from the human being, such as the cleaning or aggregation process (e.g. the number of seconds spent watching a video for a recommendation algorithm). This chaining of data is very important, and is often quite "home-made", full of small errors, bias or inaccuracies, much more common than one would like to say: I experienced the case of a hotel group that had 70% Afghan customers because the Afghan nationality appeared first by default in the list and most of the agents did not fill in the field. Definitions also change over time (e.g. what is an active customer?) and therefore the same data changes meaning and cannot be consumed as such.
Another aspect will concern the understanding of the function itself of the algorithm. This is a typically human design choice. If you are an online travel agency and you are building an algorithm for comparing hotel offers, there are several ways of understanding what "20 km from a given point" means: is it strict, is it as the crow flies, can the constraint be relaxed if there is too little offer, or if there is a very attractive offer but 25 km away? If the choice of ranking can be left to an algorithm, the choice of the assortment or the emphasis is still very often human, almost editorial.
Finally, human beings can monitor the results of these algorithms, particularly in the case of pricing algorithms, which determine the price of a product at a given moment and which can cause unwanted, high-impact effects. On platforms regulated by human operators, the operator receives an alert and must seek to understand the abnormal price change by analysing the causes and context, as well as the level of risk. There is therefore already a form of regulation in place but, depending on whether it is private or public, the objectives are not the same - trade is less neutral than tax calculation - and neither are responsibility and a sense of duty.
In what context was the concept of "algorithm transparency" forged?
In France, the expression transparency was popularised at the time of the digital law of 2017, which stipulated that the decision-making algorithms of public bodies had an obligation of transparency towards citizens. The debate focused mainly on algorithms that make individual decisions about resource allocation, tax calculation, retirement, Parcoursup, etc., which are not transparent. At that time, the desire to modernise the State in order to develop its administrative efficiency came up against the fear of citizens that the State's decisions, aided by these algorithms, would escape us, be opaque, incomprehensible and, in the end, unfair. Today, in these times of epidemics, decisions based on algorithms (whether forecasts or propagation models), seem more random or uncontrollable to the citizen. Even INSEE's calculations are being called into question.
This suspicion of injustice is certainly a trial of intent, but it is legitimate and calls for a pedagogy that is costly and limits the technical possibilities of algorithms that are sometimes more efficient but inexplicable. There was also a lot of talk at the time about machine learning and deep learning - the State should not have many algorithms based on deep learning elsewhere - but the debate focused on a kind of robotisation of society, on the phantasmagorical place of artificial intelligence, which has amplified fears despite scientific reality.
It is important to understand that algorithms did not wait for deep learning to "take" unfair decisions or to cheat. Even the simplest algorithms can be unfair: at one time, in a call centre, when an ordinary citizen called, it was noticed that calls from a landline telephone, more often at work, were taken after calls from mobile phones, because subscribers waited less time on a package that was not unlimited at the time. Recommendation algorithms have been in use for at least twenty years, since the first GPS systems, for example, without being questioned.
The level of fear has increased, but so has the level of accountability, particularly of public services. It is this feeling of disapproval that we need to take into account.
One must be irreproachable in the pedagogy of numbers, algorithms, their meaning and their weaknesses. It is a huge task because the more we explain, the more we raise new questions and paradoxes, such as complaining about Social Security data collection while surfing on Facebook.
How are these algorithms regulated today?
The development of the public debate and the evolution of the legislative framework, even if it is slow, is indisputable. Margrethe Vestager Hansen, European Commissioner for Competition, officially announced last November an investigation into possible anti-competitive practices by Amazon. One of them, self-preferencing, would consist of giving more prominence to the parent company's products than to those of other producers it distributes, which is not allowed when one holds a dominant position. Today, the courts can ask for a regulation of search algorithms and punish this kind of behaviour, if they can prove it, of course. A second example is the regulation of notice filtering algorithms on merchant sites: the DGCCRF has set up certification standards to prevent the manufacture of false notices or the elimination by the manager of negative notices.
First of all, we can hope for a development of information on private and public sites that details the data used and the way in which this data is sorted or processed to arrive at a particular result. What exactly does "there are only 3 rooms available at this price" mean? We can also imagine moving towards setting up a kind of observatory to monitor the practices of digital platforms, which is what certain specialised regulatory authorities, such as ARCEP, are beginning to do for telecoms, with the notion of data regulation. In addition, the creation of a special status for so-called structuring platforms - which typically have strong market power - will allow for increased monitoring of their algorithms. Some European regulators are even pushing to require platforms to anticipate the effect of changes in their algorithms and inform the regulatory authorities. For content sites, new responsibilities for hosting providers are under consideration, emphasising the duty of care, the duty of self-regulation, considering under- or over-moderation, which can lead to censorship. A centre of expertise on digital regulation was created in September 2020 to support this regulatory effort, to deal with new forms of fraud, anti-competitive practices or negligence, and to restore the balance between online practices and practices in the physical world.
How to fight against the opacity of algorithms?
The fight to be carried out depends a lot on the risk caused by opacity. The risks of fraud or manipulation are among the most sensitive for the citizen, but the anti-competitive risks can have more insidious and long-term effects. Today we speak of "Big Tech", not only in Europe but also in the United States, which produced them. What happens if two algorithms from two leading companies on the same market get along "discreetly" and without human intervention, to raise their price? The perception of the risks of a lack of diversity for content distributors, of the distribution of illegal or harmful content is increasing sharply and has an algorithmic dimension.
Moreover, opacity is not always intentional: there are layers of software that can pile up from year to year in a company or an administration, and which result in an inexplicable algorithmic response so drowned in past and present information. Sometimes, however, making the process opaque is a deliberate act, as in the case of the famous TOU, which are strictly illegible if one is not a doctor of commercial law. There have been sanctions for this.
Head of Regalia Project
In the face of these dangers, two main families of complementary approaches can be identified: a construction approach by forcing/inciting algorithm developers to use certain technologies that make the processes readable and to report on their effects; and an observation approach that consists of auditing sites or their algorithmic engines, probing by sampling (such as a longitudinal anti-doping test), and checking that behaviors are compliant. Understandably, the second approach raises many reliability issues, as do CO2 emissions tests for diesel engines. Nothing could be easier for an algorithm to incorporate a function to detect the probes that bombard it, if their behaviour is atypical. Digital platforms are used to recognising and processing the robots that attack them. On the other hand, purely transparent approaches, by auditing complete code, seem to me to be very difficult to implement with private sector platforms for a whole range of reasons. However, a form of data-regulation can be built by combining longitudinal monitoring with extensive corporate reporting requirements.
Finally, we must insist on the fact that private and public algorithms do not have the same objectives and do not lead to the same expectations, and rightly so: bankers favoured certain files in the credit agreement long before the arrival of algorithms! On the other hand, the State has a duty of irreproachability. Moreover, the complexity of the administration's algorithms has nothing to do with what is done in current digital platforms and, in fact, the means allocated to the creation of new algorithms are much greater at Amazon or Uber. Most public surveillance authorities have, in recent years, set up dedicated teams and tools for studying and monitoring online behaviour.
We believe that academic research has a lot to contribute on these subjects. Rather, it has so far contributed to the creation of formidable software libraries for predicting and influencing customer behaviour, and now aspires to build artificial intelligences that are at once explainable, frugal and loyal. Arming the regulator against the algorithmic mastodons raises other, non-elementary questions that require a duty of scientific rigour, a discourse of proof or at least a solution parallel to the surveillance method.