Read-it project

News

Brigitte Ouvry-Vial attends a round-table on digital technology for cultural heritage

On 12 March 2021, Brigitte Ouvry-Vial presented READ-IT at a round table about Cultural Heritage ‘From one technology to another’, within the Digital Technology and Heritage- Challenges and Issues Conference, organized by the ANR (French National Research Agency) and the EU-Joint Programming Initiative for Cultural Heritage (https://evenement.anr.fr/numerique-patrimoine). Other participant projects included RESEED, KAMOULOX, EPIQUE, all being impressive in the technological as well as scientific challenges they confront.

Although very diverse in their topics, objects and methods considered, the projects shared common issues that are still hard to overcome in the area of search engines for digital cultural heritage. The moderator Marieke van Erpe (Royal Academy, Netherlands, Digital Humanities Lab, KNAW Humanities cluster) raised several interesting questions, including: How do we bring the richness of multimodal archives to users?

The following response was offered: Some project coordinators clearly admitted not reaching out to potential end-users as the gap is still too big between the technicity of data science methods and the methods used, as well as needs expressed by SSH scholars. There is work to be done in order to reconcile the complexity of features in digital tools with the lack of technical training or SSH end-users.  In the case of READ-IT, it is interesting to note that the interdisciplinary consortium includes end-users (namely SSH scholars), and that the various steps of tools development were co-constructed and/or evaluated and co-validated by SSH scholars along with ICT researchers. Outside the fact that reading is a popular activity and one that the broad public likes to comment upon,  this specificity maybe the reason why READ-IT can expect its annotation tool to be an efficient enabler of SSH research on the history of reading.

Still, as Guillaume Gravier (READ-IT, IRISA, CNRS, France) pointed out, there is a need for clear data models, efficient multimodal search engines with intuitive interfaces (ever tried to write a SPARQL request?), as well as for public engagement events testing out the developments. There are also some legal issues regarding authorship and copyright that limit the full access to cultural heritage sources and exploration tools.

More broadly, Marieke van Erpe raised the question of ‘AI for cultural heritage’ asking: What is the biggest obstacle in current use of AI for cultural heritage?; What are the biggest challenges going forward?; Some archives were originally created for researchers, by researchers. How can we help the public understand the domain and its intricacies? For example EPIQUE (a research tool for research in epistemology) may be quite intimidating to users not familiar with the domain, but the evolution of science is relevant to many users.

As a response, Bernd Amann, EPIQUE coordinator, replied: Concerning the topic ‘AI for cultural heritage’ and from a practical view, EPIQUE is clearly still too far from being used by users and even experts which are not familiar with current data science tools. From a technical point of view, EPIQUE illustrates the difficulty building user-friendly applications and interfaces exposing even simple AI tools (NLP, topic extraction, classification, …) and services. In most applications, these AI tools are invisible to the end-users (this is similar to high-level database query languages like SQL which are encapsulated in programs and invisible to the users).

Brigitte Ouvry-Vial pointed out that READ-IT deals with a ‘popular’ topic for which there are many public exchanges both historical (tons of letters and diaries describing reading experiences, reading pleasure, etc) and contemporary (millions of posts on social networks such as Goodreads). The goal is to provide a user-friendly tool accommodating the needs of scholars with limited DH training as well as casual APIs (chatbot for example) allowing the broad public to easily engage and participate, and meanwhile providing raw data for future processing. The tools incorporated in READ-IT interfaces will be ‘basically’ usable for scholarly experts (for annotations and concept recognition) as well as by ordinary readers/internauts, but we are still at the proof-of-concept level and to reach the full extent of expected benefits will require further developments and research (another JPI.. !).

For Guillaume Gravier, the biggest obstacle is AI itself which today works in a highly supervised manner. Building adequate AI technology for cultural heritage often requires collecting and annotating very large number of examples of what we want AI to do before being able to do real things. This is just intractable in most cultural heritage sectors. For example, concept recognition in Google-like natural scene images works well, but is not very suited for archives like the ones that BNF have; the same goes for READ-IT, there are no off-the-shelf solutions to detect reading situations, less to analyze them in detail (circumstances, outcomes, etc.). So the challenge today is that we need to be able to conceive and train AI solutions with very limited data resources, in a highly dynamic way, with the help of users.