Processing Large-Scale Archival Records: The Case of the Swiss Parliamentary Records

被引:0
|
作者
Salamanca, Luis [1 ]
Brandenberger, Laurence [2 ]
Gasser, Lilian [1 ]
Schlosser, Sophia [2 ]
Balode, Marta [2 ]
Jung, Vincent [2 ]
Perez-Cruz, Fernando [1 ]
Schweitzer, Frank [2 ]
机构
[1] SDSC, Zurich, Switzerland
[2] Swiss Fed Inst Technol, Weinbergstr 56, CH-8092 Zurich, Switzerland
关键词
archival records; parliamentary proceedings; Swiss parliament; text processing; text-to-data; POLARIZATION;
D O I
10.1111/spsr.12590
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Legislative bodies generally keep records of their activities. While the digitization wave spurred the availability of archival documents, their processing remains a challenge. The Swiss parliamentary records are no exception. In this paper we present a supervised pipeline for extracting and structuring of content of archival records. Our pipeline consists of five steps, starting with an assessment of which elements need extraction and how they relate to each other. Step two involves general pre-processing to prepare the PDF documents and is followed by an element classification step. Step four involves post-processing and the final step is a validation of the extracted information. With our supervised approach, we are able to process over 200,000 pages of Swiss parliamentary records (spanning the years 1891-1995), a feat that would exceed the budget of most projects using manual curation. We discuss validation of individual steps and offer guidance to researchers engaged in similar data processing efforts.
引用
收藏
页码:140 / 153
页数:14
相关论文
共 50 条
  • [41] Catch Me If You Can: Detecting Pickpocket Suspects from Large-Scale Transit Records
    Du, Bowen
    Liu, Chuanren
    Zhou, Wenjun
    Hou, Zhenshan
    Xiong, Hui
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 87 - 96
  • [42] Implementing Large-Scale Electronic Health Records: Experiences from implementations of Epic in Denmark and Finland
    Hertzum, Morten
    Ellingsen, Gunnar
    Cajander, Asa
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2022, 167
  • [43] ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis
    Gan, Ziming
    Zhou, Doudou
    Rush, Everett
    Panickan, Vidul A.
    Hoe, Yuk-Lam
    Ostrouchovm, George
    Xu, Zhiwei
    Shen, Shuting
    Xiong, Xin
    Greco, Kimberly F.
    Hong, Chuan
    Bonzel, Clara-Lea
    Wend, Jun
    Costa, Lauren
    Cai, Tianrun
    Begoli, Edmon
    Xiaj, Zongqi
    Gaziano, J. Michael
    Liao, Katherine P.
    Cho, Kelly
    Cai, Tianxi
    Lu, Junwei
    JOURNAL OF BIOMEDICAL INFORMATICS, 2025, 162
  • [44] Landsat historical records reveal large-scale dynamics and enduring recovery of seagrasses in an impacted seascape
    Fernandes, Milena B.
    Hennessy, Andrew
    Law, Wallace Boone
    Daly, Robert
    Gaylard, Sam
    Lewis, Megan
    Clarke, Kenneth
    SCIENCE OF THE TOTAL ENVIRONMENT, 2022, 813
  • [45] Large-scale destabilization events in hydrological structure of oceans, biotic crises, and corresponding geological records
    Beznosov, VN
    STRATIGRAPHY AND GEOLOGICAL CORRELATION, 2000, 8 (03) : 211 - 220
  • [46] PHENOTREE: Interactive Visual Analytics for Hierarchical Phenotyping From Large-Scale Electronic Health Records
    Baytas, Inci M.
    Lin, Kaixiang
    Wang, Fei
    Jain, Anil K.
    Zhou, Jiayu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (11) : 2257 - 2270
  • [47] Landsat historical records reveal large-scale dynamics and enduring recovery of seagrasses in an impacted seascape
    Fernandes, Milena B.
    Hennessy, Andrew
    Law, Wallace Boone
    Daly, Robert
    Gaylard, Sam
    Lewis, Megan
    Clarke, Kenneth
    Science of the Total Environment, 2022, 813
  • [48] AN AUTOMATIC SAR-BASED CHANGE DETECTION METHOD FOR GENERATING LARGE-SCALE FLOOD DATA RECORDS: THE UK AS A TEST CASE
    Zhao, Jie
    Chini, Marco
    Matgen, Patrick
    Hostache, Renaud
    Pelich, Ramona
    Wagner, Wolfgang
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 6138 - 6141
  • [49] Logan: Automatic Management for Evolvable, Large-Scale, Archival Storage
    Storer, Mark W.
    Greenan, Kevin M.
    Adams, Ian F.
    Miller, Ethan L.
    Long, Darrell D. E.
    Voruganti, Kaladhar
    PDSW'08: PROCEEDINGS OF THE 2008 3RD PETASCALE DATA STORAGE WORKSHOP, 2008, : 50 - +
  • [50] Evaluating critical rainfall conditions for large-scale landslides by detecting event times from seismic records
    Kuo, Hsien-Li
    Lin, Guan-Wei
    Chen, Chi-Wen
    Saito, Hitoshi
    Lin, Ching-Weei
    Chen, Hongey
    Chao, Wei-An
    NATURAL HAZARDS AND EARTH SYSTEM SCIENCES, 2018, 18 (11) : 2877 - 2891