archival records;
parliamentary proceedings;
Swiss parliament;
text processing;
text-to-data;
POLARIZATION;
D O I:
10.1111/spsr.12590
中图分类号:
D0 [政治学、政治理论];
学科分类号:
0302 ;
030201 ;
摘要:
Legislative bodies generally keep records of their activities. While the digitization wave spurred the availability of archival documents, their processing remains a challenge. The Swiss parliamentary records are no exception. In this paper we present a supervised pipeline for extracting and structuring of content of archival records. Our pipeline consists of five steps, starting with an assessment of which elements need extraction and how they relate to each other. Step two involves general pre-processing to prepare the PDF documents and is followed by an element classification step. Step four involves post-processing and the final step is a validation of the extracted information. With our supervised approach, we are able to process over 200,000 pages of Swiss parliamentary records (spanning the years 1891-1995), a feat that would exceed the budget of most projects using manual curation. We discuss validation of individual steps and offer guidance to researchers engaged in similar data processing efforts.
机构:
Tohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Tohoku Univ, Grad Sch Med, Sendai, JapanTohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Murakami, Keiko
Nagai, Masato
论文数: 0引用数: 0
h-index: 0
机构:
Tohoku Univ, Tohoku Med Megabank Org, Sendai, JapanTohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Nagai, Masato
Matsubara, Hiroko
论文数: 0引用数: 0
h-index: 0
机构:
Tohoku Univ, Tohoku Med Megabank Org, Sendai, JapanTohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Matsubara, Hiroko
Oonuma, Tomomi
论文数: 0引用数: 0
h-index: 0
机构:
Tohoku Univ, Tohoku Med Megabank Org, Sendai, JapanTohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Oonuma, Tomomi
论文数: 引用数:
h-index:
机构:
Matsuzaki, Fumiko
论文数: 引用数:
h-index:
机构:
Noda, Aoi
Ishikuro, Mami
论文数: 0引用数: 0
h-index: 0
机构:
Tohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Tohoku Univ, Grad Sch Med, Sendai, JapanTohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Ishikuro, Mami
论文数: 引用数:
h-index:
机构:
Obara, Taku
Kuriyama, Shinichi
论文数: 0引用数: 0
h-index: 0
机构:
Tohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
Tohoku Univ, Grad Sch Med, Sendai, Japan
Tohoku Univ, Int Res Inst Disaster Sci, Sendai, Miyagi, JapanTohoku Univ, Tohoku Med Megabank Org, Sendai, Japan
机构:
Univ Washington, Washington Natl Primate Res Ctr, Seattle, WA 98195 USA
Univ Washington, Dept Psychol, Seattle, WA 98195 USAUniv Washington, Washington Natl Primate Res Ctr, Seattle, WA 98195 USA
机构:
Oregon Hlth Sci Univ, Div Med Informat & Outcomes Res, Portland, OR 97201 USAOregon Hlth Sci Univ, Div Med Informat & Outcomes Res, Portland, OR 97201 USA
Hersh, WR
Campbell, EM
论文数: 0引用数: 0
h-index: 0
机构:
Oregon Hlth Sci Univ, Div Med Informat & Outcomes Res, Portland, OR 97201 USAOregon Hlth Sci Univ, Div Med Informat & Outcomes Res, Portland, OR 97201 USA
Campbell, EM
Malveau, SE
论文数: 0引用数: 0
h-index: 0
机构:
Oregon Hlth Sci Univ, Div Med Informat & Outcomes Res, Portland, OR 97201 USAOregon Hlth Sci Univ, Div Med Informat & Outcomes Res, Portland, OR 97201 USA