Processing Large-Scale Archival Records: The Case of the Swiss Parliamentary Records

被引:0
|
作者
Salamanca, Luis [1 ]
Brandenberger, Laurence [2 ]
Gasser, Lilian [1 ]
Schlosser, Sophia [2 ]
Balode, Marta [2 ]
Jung, Vincent [2 ]
Perez-Cruz, Fernando [1 ]
Schweitzer, Frank [2 ]
机构
[1] SDSC, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Weinbergstr 56, CH-8092 Zurich, Switzerland
关键词
archival records; parliamentary proceedings; Swiss parliament; text processing; text-to-data; POLARIZATION;
D O I
10.1111/spsr.12590
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Legislative bodies generally keep records of their activities. While the digitization wave spurred the availability of archival documents, their processing remains a challenge. The Swiss parliamentary records are no exception. In this paper we present a supervised pipeline for extracting and structuring of content of archival records. Our pipeline consists of five steps, starting with an assessment of which elements need extraction and how they relate to each other. Step two involves general pre-processing to prepare the PDF documents and is followed by an element classification step. Step four involves post-processing and the final step is a validation of the extracted information. With our supervised approach, we are able to process over 200,000 pages of Swiss parliamentary records (spanning the years 1891-1995), a feat that would exceed the budget of most projects using manual curation. We discuss validation of individual steps and offer guidance to researchers engaged in similar data processing efforts.
引用
收藏
页码:140 / 153
页数:14
相关论文
共 50 条
  • [1] SLM records holograms for large-scale switch
    Lewotsky, K
    LASER FOCUS WORLD, 1996, 32 (03): : 18 - +
  • [2] Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
    Hai Nguyen
    Weber, Matthew S.
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 724 - 727
  • [4] Difficulties in Accessing Medication Records at the Time of a Large-Scale Disaster
    Ueno, Fumihiko
    Murakami, Keiko
    Nagai, Masato
    Matsubara, Hiroko
    Oonuma, Tomomi
    Matsuzaki, Fumiko
    Noda, Aoi
    Ishikuro, Mami
    Obara, Taku
    Kuriyama, Shinichi
    DISASTER MEDICINE AND PUBLIC HEALTH PREPAREDNESS, 2023, 17
  • [5] Urbanization effects in large-scale temperature records, with an emphasis on China
    Jones, P. D.
    Lister, D. H.
    Li, Q.
    JOURNAL OF GEOPHYSICAL RESEARCH-ATMOSPHERES, 2008, 113 (D16)
  • [6] THE STATISTICS OF LARGE-SCALE PRIMATE POPULATIONS: RECORDS, DEMOGRAPHICS, AND GENETICS
    Ha, J.
    AMERICAN JOURNAL OF PRIMATOLOGY, 2010, 72 : 33 - 33
  • [8] Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: A lexical analysis
    Hersh, WR
    Campbell, EM
    Malveau, SE
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1997, : 580 - 584
  • [9] Using a Probabilistic Model to Assist Merging of Large-Scale Administrative Records
    Enamorado, Ted
    Fifield, Benjamin
    Imai, Kosuke
    AMERICAN POLITICAL SCIENCE REVIEW, 2019, 113 (02) : 353 - 371
  • [10] Stratifying risk using large-scale electronic health records data
    Perlis, R. Y.
    McCoy, T.
    Wiste, A.
    Ostacher, M.
    Castro, V.
    BIPOLAR DISORDERS, 2015, 17 : 12 - 12