A Universal Part-of-Speech Tagset

被引:0
|
作者
Petrov, Slav [1 ]
Das, Dipanjan [2 ]
McDonald, Ryan [1 ]
机构
[1] Google Res, New York, NY 10011 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Part-of-Speech Tagging; Multilinguality; Annotation Guidelines;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.
引用
收藏
页码:2089 / 2096
页数:8
相关论文
共 50 条
  • [1] A Proposal for a Part-of-Speech Tagset for the Albanian Language
    Kabashi, Besim
    Proisl, Thomas
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4305 - 4310
  • [2] Adapting a part-of-speech tagset to non-standard text: The case of STTS
    Zinsmeister, Heike
    Heid, Ulrich
    Beck, Kathrin
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 4097 - 4104
  • [3] INTERJECTIONS - THE UNIVERSAL YET NEGLECTED PART-OF-SPEECH - INTRODUCTION
    AMEKA, F
    [J]. JOURNAL OF PRAGMATICS, 1992, 18 (2-3) : 101 - 118
  • [4] An Enhanced Mapping Scheme of the Universal Part-Of-Speech for Korean
    Kim, Maria Myung-Hee
    Colineau, Nathalie
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3826 - 3833
  • [5] Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching
    Sow, Victor
    Hirschberg, Julia
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 77 - 81
  • [6] Part-of-speech persistence: The influence of part-of-speech information on lexical processes
    Melinger, Alissa
    Koenig, Jean-Pierre
    [J]. JOURNAL OF MEMORY AND LANGUAGE, 2007, 56 (04) : 472 - 489
  • [7] Part-of-speech tagging
    Martinez, Angel R.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (01): : 107 - 113
  • [8] ADVERBIAL PART-OF-SPEECH
    CERVONI, J
    [J]. LANGUE FRANCAISE, 1990, (88): : 5 - 11
  • [9] Developing a tagset for Pashto part of speech tagging
    Rabbi, Ihsan
    Khan, Mohammad Abid
    Ali, Rahman
    [J]. 2008 SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, 2008, : 111 - 116
  • [10] Part-of-speech tagging for Swedish
    Prütz, K
    [J]. PARALLEL CORPORA, PARALLEL WORLDS, 2002, (43): : 201 - 206