An automatic part-of-speech tagger for Middle Low German

被引：2

作者：

Koleva, Mariya ^{[1
]}

Farasyn, Melissa ^{[2
]}

Desmet, Bart ^{[1
]}

Breitbarth, Anne ^{[2
]}

Hoste, Veronique ^{[1
]}

机构：

[1] Univ Ghent, Language & Translat Technol Team LT3, Groot Brittannielaan 45, B-9000 Ghent, Belgium

[2] Univ Ghent, Dept Linguist IaLing, Blandijnberg 2, B-9000 Ghent, Belgium

来源：

INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS | 2017年 / 22卷 / 01期

关键词：

historical linguistics; part-of-speech tagging; conditional random fields; feature selection; normalization;

D O I：

10.1075/ijcl.22.1.05kol

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them.

引用

页码：107 / 140

页数：34

共 50 条

[31] Enriching the knowledge sources used in a maximum entropy part-of-speech tagger
Toutanova, K
Manning, CD
PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA, 2000, : 63 - 70
[32] Standards for automatic part-of-speech tagging
Minnaja, DC
15TH INTERNATIONAL CONGRESS ON CYBERNETICS, PROCEEDINGS, 1999, : 745 - 750
[33] A Character-Based Part-of-Speech Tagger with Feedforward Neural Networks
Kolesau, Aliaksei
Sesok, Dmitrij
Rybokas, Mindaugas
ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2018, 21 (04): : 446 - 459
[34] Part-Of-Speech Tagger in Malayalam Using Bi-directional LSTM
Rajan, Rajeev
Joseph, Anna J.
Robin, Elizabeth K.
Nishma, Fathima T. K.
PROCEEDINGS OF 2020 23RD CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (ORIENTAL-COCOSDA 2020), 2020, : 22 - 27
[35] An Adaptive Harmony Search Part-of-Speech tagger for Square Hmong Corpus
Kang, Di -Wen
Ye, Shao-Qiang
Ahmad, Sharifah Zarith Rahmah Syed
Mo, Li-Ping
Qin, Feng
Zhou, Pan
BAGHDAD SCIENCE JOURNAL, 2024, 21 (02) : 622 - 632
[36] Deep Belief Network Based Part-of-Speech Tagger for Telugu Language
Jagadeesh, M.
Kumar, M. Anand
Soman, K. P.
PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 75 - 84
[37] Designing HMM-based part-of-speech tagger for Lithuanian language
Pajarskaite, G
Griciute, V
Raskinis, G
Kuper, J
INFORMATICA, 2004, 15 (02) : 231 - 242
[38] An auxiliary Part-of-Speech tagger for blog and microblog cyber-slang
Golia, Silvia
Zola, Paola
STATISTICAL ANALYSIS AND DATA MINING, 2023, 16 (01) : 65 - 79
[39] Part-of-Speech Tagger for Biomedical Domain Using Deep Neural Network Architecture
Gopalakrishnan, Athira
Soman, K. P.
Premjith, B.
2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
[40] A hybrid part-of-speech tagger with annotated Kurdish corpus: advancements in POS tagging
Maulud, Dastan
Jacksi, Karwan
Ali, Ismael
DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2023, 38 (04) : 1604 - 1612

← 1 2 3 4 5 →