In the final year of the project, the impresso team has planned out several events, related to historical and NLP aspects of the digitised newspapers.

Digitised newspapers - a new Eldorado for historians ?

From manual, on-site exploration of microfilm or paper collections to online keyword search over millions of OCRized page, access to digitized newspapers has changed significantly. Coupled with automatic enrichement of sources via text and image processing, this represents a whole world of new possibilities. An Eldorado? Despite undeniable merits, the digital transformation and the new affordance of historical newspapers also brings some drawbacks and possible pitfalls which need to be carefully assessed. The Eldorado workshop, supported by the impresso project, will bring together a group of historians, librarians, computer scientists and designers to discuss how digitisation is changing historical research practices.

More information here

HIPE (Identifying Historical People, Places and other Entities)

HIPE (Identifying Historical People, Places and other Entities) is a named entity processing evaluation campaign on historical newspapers in French, German and English, organized in the context of the impresso project and run as a CLEF 2020 Evaluation Lab. Since its introduction some twenty years ago, named entity (NE) processing has become an essential component of virtually any text mining application and has undergone major changes. Recently, two main trends characterise its developments: the adoption of deep learning architectures, and the consideration of textual material originating from historical and cultural heritage collections. While the former opens up new opportunities, the latter introduces new challenges with heterogeneous, historical and noisy inputs. If NE processing tools are increasingly being used in the context of historical documents, performances are below the ones on contemporary data and are hardly comparable. In this context, the objective of HIPE is threefold:

  • to strengthen the robustness of existing approaches on non-standard input;
  • to enable performance comparison of NE processing on historical texts; and, in the long run,
  • to foster efficient semantic indexing of historical documents in order to support scholarship on digital cultural heritage collections.

More informationhere