In the AI garden: the decision tree

Changed on 03/01/2020

Meeting with Marie-Dominique Devignes researcher CNRS in the project team CAPSID common to INRIA Nancy-Grand Est and laboratory Loria, a specialist in bioinformatics.

Each time a program answers yes or no to a test, this can be represented in the form of a branch. Each of these branches can lead to a new test, and therefore yet another branch. This is how a decision tree grows. The more diverse the tests, the more possible ramifications there will be. However, nothing would “grow” in this particular garden without scientists first planting seeds in the IT soil in the form of data sets. To find out more about this, we spoke to Marie-Dominique Devignes.

“In health, decision support doesn’t grow out of untapped earth. Doctors already have diagnostic algorithms based on their own knowledge, past experience and research. This expertise can be represented in the form of IT decision trees, enabling machines to execute this type of reasoning. However, machines can also help new trees to grow, using data to “teach” them. For example, in order to understand why certain patients respond to a given treatment, we begin by collecting all of the biomedical data available on them, before dividing the patients into two categories depending on whether or not they responded to the treatment. The machine then analyses this data set, extracting what is predominant from each category in the form of tests comprising the branches of a decision tree. This tree is ideal for the data set provided, but it remains just a model - the next step involves verifying it to ensure that these new patients have been categorised correctly. If the programme is found to make too many mistakes, a new tree has to be grown by expanding the learning data set. As is the case with human beings, AI has its limits: machines can’t learn properly using limited data sets.

What links AI and the semantic web?

The web is a place where massive amounts of data can be found. Semantic web technologies can be used to create AI using data from the web. This technology gives sense to this data, enabling machines to learn something from it as would be the case with human beings. The way in which pathologies are categorised must first be represented in accordance with the standards of the semantic web - this is what we call biomedical ontology. This can then be used by a programme capable of calculating similarities between sick individuals. For a long time, the focus in AI research has been on attempts to “teach” biomedical ontologies automatically based on scientific publications from the web. This is very difficult, however, given the complexity of both the language used and the medical concepts themselves.

What is a “black box” in the world of AI?

This is the result of what is learned by the neural networks, whether deep learning or otherwise. We enter the box by asking a specific question on a specific example, providing all of the necessary data, before leaving with the answer given by the neural network (e.g. “yes, this patient will respond to the treatment”). The box is said to be black because the network is unable to provide a reason for giving this answer. Instead, it uses the results from training based on thousands of the same type of question, to which the answer is already known. This is the opposite of decision trees or ontologies, which are designed to be intelligible. The challenge is now to combine the performances of the AI black box with the qualities of intelligible AI”.

Laurence Verger

Research Communication Manager

Nancy CHRU (Regional University Hospital)