A Dataset for Analysing Complex Document Layouts in the Digital Humanities and Its Evaluation with Krippendorff's Alpha

被引:2
|
作者
Tschirschwitz, David [1 ]
Klemstein, Franziska [1 ]
Stein, Benno [1 ]
Rodehorst, Volker [1 ]
机构
[1] Bauhaus Univ, Weimar, Germany
来源
关键词
Document layout analysis; Digital humanities; Instance segmentation; Inter-annotator-agreement;
D O I
10.1007/978-3-031-16788-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a new research resource in the form of a high-quality, domain-specific dataset for analysing the document layout of historical documents. The dataset provides an instance segmentation ground truth with 19 classes based on historical layout structures that stem (a) from the publication production process and the respective genres (life sciences, architecture, art, decorative arts, etc.) and, (b) from selected text registers (such as monograph, trade journal, illustrated magazine). Altogether, the dataset contains more than 52,000 instances annotated by experts. A baseline has been tested with the well-known Mask R-CNN and compared to the state-of-the-art model VSR [55]. Inspired by evaluation practices from the field of Natural Language Processing (NLP), we have developed a new method for evaluating annotation consistency. Our method is based on Krippendorff's alpha (K-alpha), a statistic for quantifying the so-called "inter-annotator-agreement". In particular, we propose an adaptation of K-alpha that treats annotations as a multipartite graph for assessing the agreement of a variable number of annotators. The method is adjustable with regard to evaluation strictness, and it can be used in 2D or 3D as well as for a variety of tasks such as semantic segmentation, instance segmentation, and 3D point cloud segmentation.
引用
收藏
页码:354 / 374
页数:21
相关论文
共 22 条
  • [1] SCUT-CAB: A New Benchmark Dataset of Ancient Chinese Books with Complex Layouts for Document Layout Analysis
    Cheng, Hiuyi
    Jian, Cheng
    Wu, Sihang
    Jin, Lianwen
    [J]. FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 436 - 451
  • [2] Evaluation and classification of documents: an analysis of its aplication in a digital archival document management system
    Schaefer, Murilo Billig
    Lima, Eliseu dos Santos
    [J]. PERSPECTIVAS EM CIENCIA DA INFORMACAO, 2012, 17 (03): : 137 - 154
  • [3] Document Information Extraction and its Evaluation based on Client's Relevance
    Santosh, K. C.
    Belaid, Abdel
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 35 - 39
  • [4] Investigating digital technology's role in supporting classroom dialogue: integrating enacted affordance into analysis across a complex dataset
    Major, Louis
    Smordal, Ole
    Warwick, Paul
    Rasmussen, Ingvill
    Cook, Victoria
    Vrikki, Maria
    [J]. INTERNATIONAL JOURNAL OF RESEARCH & METHOD IN EDUCATION, 2023, 46 (01) : 37 - 55
  • [5] Complex-log mapping approach to rotation and enlargement or reduction of digital images and its performance evaluation
    Sugahara, K
    Konishi, R
    [J]. PROCEEDINGS OF THE 1996 IEEE IECON - 22ND INTERNATIONAL CONFERENCE ON INDUSTRIAL ELECTRONICS, CONTROL, AND INSTRUMENTATION, VOLS 1-3, 1996, : 1655 - 1660
  • [6] Complex S-parameter measurement and its uncertainty evaluation on a vector network analyzer
    Patel, Kamlesh
    Negi, P. S.
    Kothari, P. C.
    [J]. MEASUREMENT, 2009, 42 (01) : 145 - 149
  • [7] Evaluation of China's provincial digital economy development level and its coupling coordination relationship
    Lin, Kongtuan
    Zhang, Xuanhao
    Hou, Jie
    [J]. PLOS ONE, 2023, 18 (07):
  • [8] Evaluation of cannabidiol’s inhibitory effect on alpha-glucosidase and its stability in simulated gastric and intestinal fluids
    Hang Ma
    Huifang Li
    Chang Liu
    Navindra P. Seeram
    [J]. Journal of Cannabis Research, 3
  • [9] Evaluation of cannabidiol's inhibitory effect on alpha-glucosidase and its stability in simulated gastric and intestinal fluids
    Ma, Hang
    Li, Huifang
    Liu, Chang
    Seeram, Navindra P.
    [J]. JOURNAL OF CANNABIS RESEARCH, 2021, 3 (01)
  • [10] Weak convergence to isotropic complex SαS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$S\alpha S$\end{document} random measure
    Jun Wang
    Yunmeng Li
    Liheng Sang
    [J]. Journal of Inequalities and Applications, 2017 (1)