Semantic Web

A library for creating synthetic knowledge graphs

Changed on 13/06/2024
In computer science in general and AI in particular, we need large volumes of test data to test and improve our systems. To this end, several researchers from the Inria centre at Université Côte d'Azur and Université de Lorraine have joined forces to create PyGraft, an open source library offering tools for generating fully customizable, synthetic knowledge graphs.

The challenge: generating abstract, synthetic datasets

More and more knowledge graphs are being used by experts in machine learning, artificial intelligence, the semantic web and even ontologies (the modeling of vocabulary and knowledge on a given subject) to model, visualize and analyze the links that unite the elements of a domain and their descriptions within an information system. 

“But specialists don't always have the data they need to work on methods for processing these knowledge graphs, based on features they have already calculated or would like to use, for example because the data is private or doesn't exist," explains Pierre Monnin, a researcher in artificial intelligence with the Wimmics project team at the Inria Centre at Université Côte d’Azur, a joint project between Inria and the I3S laboratory (CNRS, UniCA). 

"Our idea with the PyGraft open-source library is therefore to provide them with a means of creating abstract and synthetic datasets that correspond perfectly to the expected characteristics. For example, by helping them create public datasets that look exactly like private data".

Enriching data with logical constructs

Why is this important? “Using PyGraft, whose first version of which was developed by Nicolas Hubert, a doctoral student at the Université de Lorraine, it is possible to carry out new studies, for example in neuro-symbolic AI," explains the researcher, who obtained his thesis at Loria laboratory in Nancy (CNRS, Inria, Université de Lorraine). Neuro-symbolic AI, sometimes presented as the third wave of AI, combines learning (e.g. via neural networks) and symbolic methods (e.g. a reproduction of human reasoning, carried out using symbols and deductive rules such as "My fridge is empty + I'm hungry = I have to go shopping"). With PyGraft, even if you don't have a dataset at your disposal, you have a synthetic, customizable data generator to help you experiment with logical constructs of this kind."

Identifying new needs and use cases

The library has been available for free download on the GitHub platform, since September 2023. It has been designed for use on a computer or server, and has been developed in Python, a programming language that has the advantage of being widely used for machine learning and artificial intelligence.

PyGraft is highly intuitive and generates data that integrates easily with other workflows. As a result, public interest rose as soon as it went online, particularly among specialists in the field of artifical intelligence (AI) or big data, in France or abroad. "We've been contacted by many users and we think some are already using it to generate abstract datasets to enable them to test the machine learning or artificial intelligence methods they are working on, or to check how they behave with larger datasets," explains Pierre Monnin. "Making this library open-source will help us federate a community of contributors and identify emerging needs within the communities of researchers and data scientists who use knowledge graphs."

More good news: the first academic publication on PyGraft has been selected to be presented at one of the most important conferences in the field of the semantic web, the ESWC 2024 conference, to be held in Greece from May 26 to 30, 2024.

Some examples of PyGraft applications

  • Experimenting with neuro-symbolic approaches combining knowledge graphs and Machine Learning methods
  • Testing the scalability of processing methods on graphs of different sizes
  • Create public synthetic datasets resembling real private data (e.g. in medicine or education).
Carton nomination Best paper award PyGraft
This work has just received a best paper award at the annual semantic web conference ESWC 2024
Pierre Monnin


Pierre Monnin

Junior Fellow in AI

Centre Inria d'Université Côte d'Azur - 2004, route des Lucioles , 06560 Valbonne Sophia Antipolis