Sites Inria

Version française

European project

Laure Guion - 26/10/2012

Metadata for a better understanding of the past

© CNRS Photothèque / Christophe Lebedinsky

2014 marks the centenary of the First World War. But how can such an event be commemorated when the relevant historical resources are scattered throughout the whole of Europe and historians have not yet been able to conduct an exhaustive study of all of them? Laurent Romary explains how, with the CENDARI project, computational sciences can provide a solution to this problem and come to the aid of all social sciences.

What is CENDARI and what is your goal?

CENDARI is a European project that strengthens collaboration between computational science researchers and historians , with the aim of networking archives from various corners of the continent. This initiative is part of the DARIAH infrastructure project, which aims to promote cooperation on methodologies and tools to aid research in social sciences . Inria is a stakeholder as I am one of its three co-directors. CENDARI focuses on history - specifically, the medieval period and the First World War . Historians studying both periods find it difficult to access archive data, which is spread out across Europe. They are faced with two problems. The first is the great diversity of documentary resources. Materials concerning the First World War, for example, include posters, billboards, audio and video archives, books, objects, artefacts, clothing, and maps. All these materials are scattered across Europe - in France, of course, but also in Serbia, Poland, Russia, and elsewhere. As the entire European continent was impacted by the event, historical resources exist everywhere. The second obstacle is unequal knowledge of the available resources. For example, the German federal archives, with their very precise explanatory texts, are very well known, whereas we know almost nothing about the archives in Serbia. The positive point is that we can draw inspiration from successful initiatives across Europe, an excellent example being the Czech National Library's medieval Manuscriptorium, and replicate them elsewhere.

What are the tangible benefits for researchers?

Imperial War Museum archives

Our goal is to use computational sciences to help other sciences - in this case, social sciences. The idea is to enable researchers to connect to all the European data, so as to be able to search for information  about a particular place, period or person. With regard to the First World War, we should be able to trace the itinerary of a Russian general in a certain period, or know how the inhabitants of a village around Verdun lived during the Second Battle of the Aisne. We can make the same transposition for the medieval period, to understand how it was that the same person came to be cited in several parchments across Europe.
To do this, we are combining two complementary Inria research themes. I am responsible for modeling the metadata associated with the archives . A documentary archive is only usable if it is described precisely using what we call metadata. I work to ensure that these descriptors used to identify the contents of the archives are integrated in a unified repository in which all the data are standardised, identifying places, people, etc. in a uniform way. Here again, there are two challenges. The first is: what do we describe? At what level of granularity? A region? A place? A locality? The second is: how can we describe such different elements in a harmonised way? International standards such as the Text Encoding initiative (TEI) or Encoding Archival Description (EAD) are used to describe videos, for example, by placing "tags" in their descriptions. By using this mass of information and integrating it in an enormous database, we will reach a stage where information is extracted automatically. The second aspect of Inria's work, coordinated in this project by Jean-Daniel Fekete of the Aviz team, concerns the viewing of this information . Faced with such a huge amount of data, it is essential to be able to search intelligently, with an intuitive interface that allows users to filter results automatically by period or geographical area, or to isolate a particular event.

How will your work help other researchers in social sciences?

The European Commission asked us to focus on the history of the Middle Ages and the First World War as this would allow us to start working on specific examples. We are getting historians to conduct research using the system we have built, and their feedback will tell us whether the interface is too simple or too complex, how to change it, and how easy they find it to use. We are observing, for example, how they look for instances of two people being present in the same place, or how they link a military decision to actions in the field. This enables us to refine the model before we expand its use, as it is clear that our generic data archive exploration tools can be applied to other periods, as well as to social sciences other than history .
At present, for the two periods assigned to us, we are concentrating on exhaustive identification of the available archives in Europe, collecting as much information as possible and standardising the data on which we are testing our methods as best we can. In parallel, we are going to launch a joint initiative with two other DARIAH projects: EHRI on the Holocaust and ARIADNE on archaeology. Our working methods are similar even though they are aimed at different research communities. As the integration of European data into a single repository is a long process, we want to establish synergies between our respective areas of expertise now , rather than waiting for the finalisation of our projects in four years' time.

“A theater of memory in the digital environment”

© Lorenza Tromboni

Emiliano Degl'Innocenti, Digital Humanist, responsible for the Digital and Multimedia Lab at Società Internazionale per lo Studio del Medioevo Latino and Fondazione Ezio Franceschini

As a medievalist and historian of philosophy I'm fascinated by the history of mnemotechnique as an attempt to manage increasing amounts of data and knowledge  (unmanageable for an individual with the simple aid of his natural memory) with artificial means. The long lasting history of western middle ages is dotted with a number of attempts to increase one's natural memory through systems of artificial memory. I've also noticed that many of our expectations and attitudes towards the digital information and the digital environment as a whole, are related to the same needs to deal with a vast amount of information of growing complexity.

Finally as a researcher in the field of digital humanities, I faced for years, every day, the gap between humanists (e.g.: medievalists) with their own disciplinary traditions, their contents and their expectations, and IT specialists. I believe that due to its nature and goals, CENDARI is the place to develop a new and more effective type of collaboration  between humanists (historians, archivists, librarians, etc..) and IT specialists. Since this starting phase in the CENDARI project, specialists coming from both traditional and IT-related disciplines are supposed to work closely and share content, workflows and goals in order to create a completely new experience for users willing to do research in the digital environment. I still believe that, in particular for Medieval Studies, what can radically change the research horizon, is the shift from the database-centric-era to a new kind of digital noosphere. More interoperable data, semantic annotation and integration with different sources, the creation of a complex systems of knowledge management: a theater of memory in the digital environment to manage, enhance and preserve our cultural heritage, with the sensitivity of the medieval philosophers, and the tools of the digital age .

Keywords: Laurent Romary History Medieval period First Wolrd War Metadata CENDARI Archives Saclay - Île-de-France European project