Creating a large-scale diachronic corpus resource: Automated parsing in the Greek papyri (and beyond)

被引:0
|
作者
Keersmaekers, Alek [1 ]
Van Hal, Toon [1 ]
机构
[1] Katholieke Univ Leuven, Dept Linguist, Leuven, Belgium
关键词
Parsing; Ancient Greek; Corpus applications; LANGUAGE;
D O I
10.1017/S1351324923000384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper explores how to syntactically parse Ancient Greek texts automatically and maps ways of fruitfully employing the results of such an automated analysis. Special attention is given to documentary papyrus texts, a large diachronic corpus of non-literary Greek, which presents a unique set of challenges to tackle. By making use of the Stanford Graph-Based Neural Dependency Parser, we show that through careful curation of the parsing data and several manipulation strategies, it is possible to achieve an Labeled Attachment Score of about 0.85 for this corpus. We also explain how the data can be converted back to its original (Ancient Greek Dependency Treebanks) format. We describe the results of several tests we have carried out to improve parsing results, with special attention paid to the impact of the annotation format on parser achievements. In addition, we offer a detailed qualitative analysis of the remaining errors, including possible ways to solve them. Moreover, the paper gives an overview of the valorisation possibilities of an automatically annotated corpus of Ancient Greek texts in the fields of linguistics, language education and humanities studies in general. The concluding section critically analyses the remaining difficulties and outlines avenues to further improve the parsing quality and the ensuing practical applications.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Creating A Large-Scale Financial News Corpus for Relation Extraction
    Wu, Haoyu
    Lei, Qing
    Zhang, Xinyue
    Luo, Zhengqian
    [J]. 2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 259 - 263
  • [2] A corpus-based connectionist architecture for large-scale natural language parsing
    Tepper, JA
    Powell, HM
    Palmer-Brown, D
    [J]. CONNECTION SCIENCE, 2002, 14 (02) : 93 - 114
  • [3] Towards Automated Log Parsing for Large-Scale Log Data Analysis
    He, Pinjia
    Zhu, Jieming
    He, Shilin
    Li, Jian
    Lyu, Michael R.
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2018, 15 (06) : 931 - 944
  • [4] Creating a Large-Scale Silver Corpus from Multiple Algorithmic Segmentations
    Krenn, Markus
    Dorfer, Matthias
    del Toro, Oscar Alfonso Jimenez
    Mueller, Henning
    Menze, Bjoern
    Weber, Marc-Andre
    Hanbury, Allan
    Langs, Georg
    [J]. MEDICAL COMPUTER VISION: ALGORITHMS FOR BIG DATA, 2016, 9601 : 103 - 115
  • [5] A Methodological Framework for Creating Large-Scale Corpus for Natural Language Processing Models
    Santos, David
    Auquilla, Andres
    Siguenza-Guzman, Lorena
    Pena, Mario
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGIES (TICEC 2021), 2021, 1456 : 87 - 100
  • [6] Large-Scale QA-SRL Parsing
    FitzGerald, Nicholas
    Michael, Julian
    He, Luheng
    Zettlemoyer, Luke
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2051 - 2060
  • [7] Automated Dynamic Resource Provisioning and Monitoring in Virtualized Large-scale Datacenter
    Abar, Sameera
    Lemarinier, Pierre
    Theodoropoulos, Georgios K.
    O'Hare, Gregory M. P.
    [J]. 2014 IEEE 28TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2014, : 961 - 970
  • [8] MINERVA: An automated resource provisioning tool for large-scale storage systems
    Alvarez, GA
    Borowsky, E
    Go, S
    Romer, TH
    Becker-Szendy, R
    Golding, R
    Merchant, A
    Spasojevic, M
    Veitch, A
    Wilkes, J
    [J]. ACM TRANSACTIONS ON COMPUTER SYSTEMS, 2001, 19 (04): : 483 - 518
  • [9] A Large-Scale Corpus for Conversation Disentanglement
    Kummerfeld, Jonathan K.
    Athreya, Vignesh
    Patel, Siva Sankalp
    Gouravajhala, Sai R.
    Gunasekara, Chulaka
    Polymenakos, Lazaros
    Peper, Joseph J.
    Ganhotra, Jatin
    Lasecki, Walter S.
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3846 - 3856
  • [10] A Corpus for Large-Scale Phonetic Typology
    Salesky, Elizabeth
    Chodroff, Eleanor
    Pimentel, Tiago
    Wiesner, Matthew
    Cotterell, Ryan
    Black, Alan W.
    Eisner, Jason
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4526 - 4546