Processing Large-Scale Archival Records: The Case of the Swiss Parliamentary Records

被引:0
|
作者
Salamanca, Luis [1 ]
Brandenberger, Laurence [2 ]
Gasser, Lilian [1 ]
Schlosser, Sophia [2 ]
Balode, Marta [2 ]
Jung, Vincent [2 ]
Perez-Cruz, Fernando [1 ]
Schweitzer, Frank [2 ]
机构
[1] SDSC, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Weinbergstr 56, CH-8092 Zurich, Switzerland
关键词
archival records; parliamentary proceedings; Swiss parliament; text processing; text-to-data; POLARIZATION;
D O I
10.1111/spsr.12590
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Legislative bodies generally keep records of their activities. While the digitization wave spurred the availability of archival documents, their processing remains a challenge. The Swiss parliamentary records are no exception. In this paper we present a supervised pipeline for extracting and structuring of content of archival records. Our pipeline consists of five steps, starting with an assessment of which elements need extraction and how they relate to each other. Step two involves general pre-processing to prepare the PDF documents and is followed by an element classification step. Step four involves post-processing and the final step is a validation of the extracted information. With our supervised approach, we are able to process over 200,000 pages of Swiss parliamentary records (spanning the years 1891-1995), a feat that would exceed the budget of most projects using manual curation. We discuss validation of individual steps and offer guidance to researchers engaged in similar data processing efforts.
引用
收藏
页码:140 / 153
页数:14
相关论文
共 50 条
  • [31] Impact of embryo transfer phenotypic records on large-scale beef cattle genetic evaluations
    Junqueira, Vinicius Silva
    Lopes, Paulo Savio
    Vilela de Resende, Marcos Deon
    Fonseca e Silva, Fabyano
    Lourenco, Daniela Andressa Lino
    Iti Yokoo, Marcos Jun
    Cardos, Fernando Flores
    REVISTA BRASILEIRA DE ZOOTECNIA-BRAZILIAN JOURNAL OF ANIMAL SCIENCE, 2018, 47
  • [32] Towards Constructing a Driver Management System Based on Large-scale Driving Operation Records
    Yokoyama, Daisaku
    Toyoda, Masashi
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 4861 - 4862
  • [33] Pedestrian movement with large-scale GPS records and transit-oriented development attributes
    Eom, Sunyong
    Kim, Hongjik
    Hasegawa, Daisuke
    Yamada, Ikuho
    SUSTAINABLE CITIES AND SOCIETY, 2024, 102
  • [34] Using GIS and historical records to reconstruct residential exposure to large-scale pesticide application
    JULIA GREEN BRODY
    DONNA J VORHEES
    STEVEN J MELLY
    SUSAN R SWEDIS
    PETER J DRIVAS
    RUTHANN A RUDEL
    Journal of Exposure Science & Environmental Epidemiology, 2002, 12 : 64 - 80
  • [35] Using GIS and historical records to reconstruct residential exposure to large-scale pesticide application
    Brody, JG
    Vorhees, DJ
    Melly, SJ
    Swedis, SR
    Drivase, PJ
    Rudel, RA
    JOURNAL OF EXPOSURE ANALYSIS AND ENVIRONMENTAL EPIDEMIOLOGY, 2002, 12 (01): : 64 - 80
  • [36] Case Study: Using Digital Signatures for the Archival of Medical Records in Hospitals
    Sageder, Sebastian
    Sametinger, Johannes
    Wiesauer, Andreas
    CRISIS: 2008 THIRD INTERNATIONAL CONFERENCE ON RISKS AND SECURITY OF INTERNET AND SYSTEMS, PROCEEDINGS, 2008, : 213 - 220
  • [37] Financing a large-scale picture archival and communication system
    Goldszal, AF
    Bleshman, MH
    Bryan, N
    ACADEMIC RADIOLOGY, 2004, 11 (01) : 96 - 102
  • [38] Research from Archival Case Records: Law, Society and Culture in China
    Zhang Zhaoyang
    MONUMENTA SERICA-JOURNAL OF ORIENTAL STUDIES, 2015, 63 (01): : 216 - 218
  • [39] Large-scale processing of coals
    Procycat, F
    ZEITSCHRIFT DES VEREINES DEUTSCHER INGENIEURE, 1933, 77 : 893 - 897
  • [40] FEMRL: A Framework for Large-Scale Privacy-Preserving Linkage of Patients' Electronic Health Records
    Karapiperis, Dimitrios
    Gkoulalas-Divanis, Aris
    Verykios, Vassilios S.
    2018 IEEE INTERNATIONAL SMART CITIES CONFERENCE (ISC2), 2018,