SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts

被引:0
|
作者
Proisl, Thomas [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg, Prof Korpuslinguist, Bismarckstr 6, D-91054 Erlangen, Germany
关键词
part-of-speech tagging; domain adaptation; evaluation; PERCEPTRON;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Off-the-shelf part-of-speech taggers typically perform relatively poorly on web and social media texts since those domains are quite different from the newspaper articles on which most tagger models are trained. In this paper, we describe SoMeWeTa, a part-of-speech tagger based on the averaged structured perceptron that is capable of domain adaptation and that can use various external resources. We train the tagger on the German web and social media data of the EmpiriST 2015 shared task. Using the TIGER corpus as background data and adding external information about word classes and Brown clusters, we substantially improve on the state of the art for both the web and the social media data sets. The tagger is available as free software.
引用
收藏
页码:665 / 670
页数:6
相关论文
共 50 条
  • [1] Part-of-Speech Tagger for Malay Social Media Texts
    Ariffin, Siti Noor Allia Noor
    Tiun, Sabrina
    [J]. GEMA ONLINE JOURNAL OF LANGUAGE STUDIES, 2018, 18 (04): : 124 - 142
  • [2] A Supervised Part-Of-Speech Tagger for the Greek Language of the Social Web
    Nikiforos, Maria Nefeli
    Kermanidis, Katia Lida
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3861 - 3867
  • [3] Part-Of-Speech Tagging for Social Media Texts
    Neunerdt, Melanie
    Trevisan, Bianka
    Reyer, Michael
    Mathar, Rudolf
    [J]. LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 139 - 150
  • [4] An automatic part-of-speech tagger for Middle Low German
    Koleva, Mariya
    Farasyn, Melissa
    Desmet, Bart
    Breitbarth, Anne
    Hoste, Veronique
    [J]. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2017, 22 (01) : 107 - 140
  • [5] A morphology-system and part-of-speech tagger for German
    Lezius, W
    Rapp, R
    Wettler, M
    [J]. NATURAL LANGUAGE PROCESSING AND SPEECH TECHNOLOGY: RESULTS OF THE 3RD KONVENS CONFERENCE, 1996, : 369 - 378
  • [6] Implementing an efficient part-of-speech tagger
    Carlberger, J
    Kann, V
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 1999, 29 (09): : 815 - 832
  • [7] An Accurate Persian Part-of-Speech Tagger
    Okhovvat, Morteza
    Sharifi, Mohsen
    Bidgoli, Behrouz Minaei
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 423 - 430
  • [8] A Practical Part-of-Speech Tagger for Bengali
    Sarkar, Kamal
    Gayen, Vivekananda
    [J]. 2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 36 - 40
  • [9] An Efficient Part-of-Speech Tagger for Arabic
    Kopru, Selcuk
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT I, 2011, 6608 : 202 - 213
  • [10] A Neural Network Model for Part-Of-Speech Tagging of Social Media Texts
    Meftah, Sara
    Semmar, Nasredine
    Sadat, Fatiha
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2821 - 2828