Heidelberg Research Architecture Early Chinese Periodicals Online

The project

The platform Early Chinese Periodicals Online (ECPO) was first implemented in collaboration with the Institute of Modern History, Academia Sinica, Taiwan with the generous support by the Chiang Ching-kuo Foundation for International Scholarly Exchange,  from 2013 to 2016. 
ECPO has since been developed with further support from various institutions, such as the Centre for Asian and Transcultural Studies (CATS) Library, the Heidelberg Centre for Transcultural Studies (HCTS), the Institute of Chinese Studies and the Research Council Cultural Dynamics in Globalized Worlds at the University of Heidelberg; along with the Confucius-Institute Heidelberg, and the University of Erlangen-Nürnberg as affiliated partners.

The ECPO database provides open access to material from several important digital collections of the early Chinese press. It combines what we call “extensive” and “intensive” approaches to China’s early periodicals. The extensive approach comprises a comprehensive catalog and record of Republican-era art and literary periodicals, including basic metadata like title, editor, publisher, location and dates of publication, periodicity, format, or prominent contributors. In the intensive approach we not only record bibliographic information, but provide digital cover-to-cover copies of entire runs of periodicals together with an in-depth analysis of their complete contents (article, image, advertisement) together with structured meta-data.

So far, six journals (four women’s journals and two entertainment periodicals) have been included in the database using the intensive approach, together with the four in-depth analyzed magazines forming the core of the “Chinese Women’s Magazines in the Late Qing and Early Republican Period” (WoMag) database. In the extensive section of the database, we have recorded over 250 periodicals, many of them hosted in databases at the Academia Sinica. This includes newspapers formerly part of the “Chinese Entertainment Newspapers” (Xiaobao) database. The project aims at incorporating metadata in both English and Chinese, and adds research-based annotations and biographical information on editors, authors and individuals represented in illustrations and advertisements in the journals. 

For a large number of periodicals we provide detailed information about their publishing history, also covering format, size, issuing frequency, prices, as well as a list of prominent agents related to the publication, a summary of its contents, and a list of its holdings in important collections worldwide.

We developed a web-based data management interface to ingest new periodicals and edit metadata. For more efficient data entry, we created a “rapid editing interface” which features a Chinese frontend.

Based on the success of ECPO, we developed a number of follow-up projects, for example “ECPO – towards automatic full text generation”, “ECPO – historical network data”, or “Jing bao ground truth”. 

Digital output

ECPO is an open access resource developed by the Heidelberg Research Architecture (HRA) and implemented mainly by independent programmers. It currently provides browse and search functionalities for about 300 publications, which can all be read online. The database is freely available at https://uni-heidelberg.de/ecpo.

All bibliographic descriptions can be accessed through the ECPO API in MODS XML format. 
ECPO Periodicals are also included in national library catalogs like the Zeitschriftendatenbank (ZDB). 

As a result of the detailed metadata recording, ECPO now comprises more than 50.000 different names of persons, groups, or institutions. We recorded these names as they occur in the original publications and developed a cross-database agent service for search and retrieval. In addition, the agents service allows us to manage name records, assign them to individual agent records, or split similar names into various agents. We use international authority files, like the Virtual International Authority File or the German National Authority File, larger knowledge bases like Wikidata, DBpedia, as well as the Chinese encyclopedia Baidu Baike, to uniquely identify our agent records and provide users with links to additional information on the respective agent. For an example, see the agent record of Bao Tianxiao 包天笑.

ECPO already contains some full-text passages. To make these discoverable we have expanded the database structure and added full-text to the search functionality.

Data set

ECPO not only provides free and open access to the scanned materials. We also publishing its metadata. The main data publication is the open licensed data set on heiDATA, the research data repository of Heidelberg University, called “Early Chinese Periodicals Online (ECPO) [Metadata]”.

It includes all major types of metadata created by the ECPO project, including 50.000 name and agent records that form the basis of the agent service, the bibliographic metadata of all periodicals (also available via the MODS API), as well as all research based annotations recorded in individual item records, i.e. articles, images, and advertisements.

ECPO has also been publishing code on GitHub. The public repositories include “ECPO data”, “ECPO annotator”, “ECPO segment”, and “ECPO fulltext experiments”. 

Related projects

ECPO triggered a number of child projects, such as ECPO full text.

Selected publications and presentations

The theoretical background of the project is best illustrated in this edited volume: 
Hockx, Michel, Joan Judge, and Barbara Mittler, ed. Women and the Periodical Press in China’s Long Twentieth Century: A Space of Their Own? Cambridge: Cambridge University Press, 2018. DOI: 10.1017/9781108304085.

Henke, Konstantin, and Matthias Arnold. “Language Model Assisted OCR Classification for Republican Chinese Newspaper Text(以語言模型輔助民國報紙文本的光學字元辨識分類).” In: 數位典藏與數位人文 12期(2023年10月)(Journal of Digital Archives and Digital Humanities, Issue 12, 10.2023), 1-19.

Arnold, Matthias, Duncan Paterson, and Jia XIE. “Procedural challenges: the FAIR principles and PRC electronic resources - a case study of Chinese republican newspapers.” In: Special issue "Digital Humanities and East Asian Studies," ed. Hilde De Weerdt and Aliz Horvath. International Journal of Digital Humanities 4 (2023), 147–170. DOI: 10.1007/s42803-022-00055-6.

Arnold, Matthias und Lena Hessel. “Transforming data silos into knowledge: Early Chinese Periodicals Online (ECPO).” In: E-Science-Tage 2019: Data to Knowledge. Hg. von Vincent Heuveline, Fabian Gebhard und Nina Mohammadianbisheh. Heidelberg: heiBOOKS, 2020, pp. 95-109. DOI: 10.11588/heibooks.598.c8420.

Sung, Doris, Liying Sun and Matthias Arnold. “The Birth of a Database of Historical Periodicals: Chinese Women’s Magazines in the Late Qing and Early Republican Period.” In Tulsa Studies in Women's Literature 33, no. 2 (2014): 227-37. 

Recent talks
Henke, Konstantin, and Matthias Arnold. “Building an OCR Pipeline for a Republican Chinese Entertainment Newspaper.” Paper presented at the ADHO DH2022 conference, Tokyo, Long presentations 7-04, #385, 2022-07-28. DOI: 10.5281/zenodo.6646899.

Arnold, Matthias. “Wege zur Erschließung der frühen chinesischen Presse.” Paper presented at the Workshop OCR - Herausforderungen und Lösungen für Zeitungen & Zeitschriften, organised by the DHd AG Zeitungen & Zeitschriften, Frankfurt/Main, November 11, 2019. DOI: 10.5281/zenodo.5752478.

Arnold, Matthias. “Multilingual research projects: Challenges (and possible solutions) for making use of standards, authority files, and character recognition.” Paper presented at the full-day workshop Towards multilingualism in Digital Humanities: Achievements, failures and good practices in DH Projects with non-latin scripts (NLS), organized by Cosima Wagner and Martin Lee. ADHO DH2019 conference, Utrecht, 2019-07-08.

Images from the resource