An automatic part-of-speech tagger for Middle Low German

被引:2
|
作者
Koleva, Mariya [1 ]
Farasyn, Melissa [2 ]
Desmet, Bart [1 ]
Breitbarth, Anne [2 ]
Hoste, Veronique [1 ]
机构
[1] Univ Ghent, Language & Translat Technol Team LT3, Groot Brittannielaan 45, B-9000 Ghent, Belgium
[2] Univ Ghent, Dept Linguist IaLing, Blandijnberg 2, B-9000 Ghent, Belgium
关键词
historical linguistics; part-of-speech tagging; conditional random fields; feature selection; normalization;
D O I
10.1075/ijcl.22.1.05kol
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them.
引用
收藏
页码:107 / 140
页数:34
相关论文
共 50 条
  • [31] Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
    Toutanova, K
    Manning, CD
    PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 63 - 70
  • [32] Standards for automatic part-of-speech tagging
    Minnaja, DC
    15TH INTERNATIONAL CONGRESS ON CYBERNETICS, PROCEEDINGS, 1999, : 745 - 750
  • [33] A Character-Based Part-of-Speech Tagger with Feedforward Neural Networks
    Kolesau, Aliaksei
    Sesok, Dmitrij
    Rybokas, Mindaugas
    ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2018, 21 (04): : 446 - 459
  • [34] Part-Of-Speech Tagger in Malayalam Using Bi-directional LSTM
    Rajan, Rajeev
    Joseph, Anna J.
    Robin, Elizabeth K.
    Nishma, Fathima T. K.
    PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 22 - 27
  • [35] An Adaptive Harmony Search Part-of-Speech tagger for Square Hmong Corpus
    Kang, Di -Wen
    Ye, Shao-Qiang
    Ahmad, Sharifah Zarith Rahmah Syed
    Mo, Li-Ping
    Qin, Feng
    Zhou, Pan
    BAGHDAD SCIENCE JOURNAL, 2024, 21 (02) : 622 - 632
  • [36] Deep Belief Network Based Part-of-Speech Tagger for Telugu Language
    Jagadeesh, M.
    Kumar, M. Anand
    Soman, K. P.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 75 - 84
  • [37] Designing HMM-based part-of-speech tagger for Lithuanian language
    Pajarskaite, G
    Griciute, V
    Raskinis, G
    Kuper, J
    INFORMATICA, 2004, 15 (02) : 231 - 242
  • [38] An auxiliary Part-of-Speech tagger for blog and microblog cyber-slang
    Golia, Silvia
    Zola, Paola
    STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (01) : 65 - 79
  • [39] Part-of-Speech Tagger for Biomedical Domain Using Deep Neural Network Architecture
    Gopalakrishnan, Athira
    Soman, K. P.
    Premjith, B.
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [40] A hybrid part-of-speech tagger with annotated Kurdish corpus: advancements in POS tagging
    Maulud, Dastan
    Jacksi, Karwan
    Ali, Ismael
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2023, 38 (04) : 1604 - 1612