Connecting Historical Newspapers and Radio
impresso - Media Monitoring of the Past is an interdisciplinary research project which aims to pioneer new approaches to the joint exploration of historical media content across time, languages, and national borders.
The first impresso project (2017-2020) developed a scalable architecture for the processing of Swiss and Luxembourgish newspaper collections and created an interface with powerful search, filter and discovery functionalities based on semantic enrichments.
The second project (2023-2027) project puts forward the vision of a comprehensive connection between media archives across languages and media types. Starting in 2023, we will build a Western European corpus, integrate newspaper and radio sources, and enable data-driven transnational historical research (see scientific abstract below).
We thank the Swiss National Science Foundation (SNSF) and Luxembourgish Fond National de la Recherche (FNR) for their trust. We are excited and honoured to collaborate with a range of old and new partners (see below).
Two positions in Natural Language Processing open at the EPFL DHLAB (3.5 years, application deadline 21.04):
- NLP Research Data Engineer
- Postdoctoral Reseacher in NLP for historical documents
One position at the UNIL (3.5 years, application deadline 04.05):
- PhD in Contemporary History
Digital Humanities Laboratory, EPFL (Maud Ehrmann);
Department of Computational Linguistics, Zurich University (Simon Clematide);
History Department, Lausanne University (Raphaëlle Ruppen Coutaz)
Centre for Contemporary and Digital History, Luxembourg University (Marten Düring)
National Library of Switzerland (Bibliothèque Nationale Suisse, BN)
National Library of Luxembourg (Bibliothèque Nationale du Luxembourg, BNL)
Austrian National Library (Österreichische Nationalbibliothek, ONB)
Berlin State Library (Staatsbibliothek zu Berlin, SBB)
The British Library (BL)
French National Library (Bibliothèque nationale de France, BnF)
Hamburg State and University Library (Staats- und Universitätsbibliothek Hamburg, HUB)
Royal Library of Belgium (Bibliothèque royale de Belgique/Koninklijke Bibliotheek van België, KBR)
Royal Library of the Netherlands (Koninklijke Bibliotheek, KB)
Radio Television Suisse (French speaking part of the Schweizerische Radio- und Fernsehgesellschaft, RTS)
Austrian Broadcasting Corporation (Österreichischer Rundfunk, ORF)
British Broadcasting Corporation (BBC)
French National Audiovisual Institute (Institut National de l'Audiovisuel, INA)
Netherlands Institute for Sounds and Vision (Nederlands Instituut voor Beeld en Geluid, NISV)
Neue Zürcher Zeitung
Entangled Media Histories Research Network for European media historians (EMHIS)
Memoriav, the Swiss network for audiovisual cultural heritage preservation
"impresso - Media Monitoring of the Past II" aims to pioneer new approaches for the joint exploration of newspaper and radio sources published in five languages and to advance innovative practices in historical research with transnational and transmedia perspectives. Historians, computational linguists, computer scientists, digital humanists and designers will work closely together to develop the technical means and scientific methods to reach these goals.
Since the 1990s, newspaper and radio archives have undergone massive digitization, and traditional barriers hindering the study of historical media, namely difficult access and tedious exploration, have started to fall. Millions of facsimiles and digital broadcast records, along with their machine-readable content, are now available for research. Existing tools for the exploration of digitized newspapers and radio broadcasts nevertheless remain in a fragmented landscape, where automatic processing and computational approaches are typically restricted to one language and one media type. These limitations severely hamper historical research, which is driven by the discovery of relations between their objects of study through iterative processes characterised by comparing, contrasting and associating sources and information.
This project proposes to overcome language and media barriers and to enable, for the first time, the joint exploration of newspaper and radio archive contents across time, languages and national borders. Our aim is not merely to juxtapose collections and deploy full-text search across them, but to enrich and connect these sources through multiple layers of cutting-edge semantic enrichments represented in a shared multilingual vector space, and to design adequate, meaningful and transparent exploration capabilities for historical research. Rather than a change of scale in terms of volume, we propose a paradigm shift in the processing, the representation and the study of sources in transmedia and transnational perspectives. Based on an unprecedented corpus of historical newspaper and radio collections from 8 Western European countries, the project will co-design and develop an open and generic technological framework for the seamless exploration of semantically connected media archives across languages and will focus on: a) the development of advanced multilingual natural language processing techniques to transform heterogeneous, unstructured and noisy historical media sources into semantically enriched data ultimately connected in a shared vector space; b) the advancement of digital (media) history research and methods; and c) the design and implementation of innovative interfaces to explore, visualise and compare vast amounts of enriched historical print and audio collections.
Historical research will fulfil 3 main purposes: a) Conducting original research on historical media ecosystems through the lens of influence to study how newspapers and radio competed and co-evolved. This analysis will be approached through 4 case studies that will operate in a shared methodological framework to ensure overall coherence and will focus on Swiss foreign policy, nuclear power and weapons, feminist activists and the evolution of content formats; b) determining methods for the data-driven study of enriched and connected media sources at scale in light of the principles of digital hermeneutics; c) identifying generalisable and diverse user requirements for data and interface design as well as collaborative research methods. The diversity of the case studies, as well as collaborations with external researchers, will ensure that user requirements and methods are sufficiently broadly defined.