Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

被引:416
|
作者
Toutanova, K [1 ]
Manning, CD [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA | 2000年
关键词
D O I
10.3115/1117794.1117802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
引用
收藏
页码:63 / 70
页数:8
相关论文
共 50 条
  • [1] Part-of-Speech Tagger Based on Maximum Entropy Model
    Huang Heyan
    Zhang Xiaofei
    2009 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 3, 2009, : 26 - 29
  • [2] Implementing an efficient part-of-speech tagger
    Carlberger, J
    Kann, V
    SOFTWARE-PRACTICE & EXPERIENCE, 1999, 29 (09): : 815 - 832
  • [3] An Accurate Persian Part-of-Speech Tagger
    Okhovvat, Morteza
    Sharifi, Mohsen
    Bidgoli, Behrouz Minaei
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 423 - 430
  • [4] A Practical Part-of-Speech Tagger for Bengali
    Sarkar, Kamal
    Gayen, Vivekananda
    2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2012, : 36 - 40
  • [5] An Efficient Part-of-Speech Tagger for Arabic
    Kopru, Selcuk
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PT I, 2011, 6608 : 202 - 213
  • [6] TnT - A statistical part-of-speech tagger
    Brants, T
    6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : 224 - 231
  • [7] Tamil Part-of-Speech tagger based on SVMTool
    Dhanalakshmi, V
    Anandkumar, M.
    Vijaya, M. S.
    Loganathan, R.
    Soman, K. P.
    Rajendran, S.
    RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 59 - +
  • [8] Toward an Effective Igbo Part-of-Speech Tagger
    Onyenwe, Ikechukwu E.
    Hepple, Mark
    Chinedu, Uchechukwu
    Ezeani, Ignatius
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)
  • [9] A suffix based part-of-speech tagger for Turkish
    Dincer, Taner
    Karaoglan, Bahar
    Kisla, Tarik
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 680 - +
  • [10] MedPost: a part-of-speech tagger for bioMedical text
    Smith, L
    Rindflesch, T
    Wilbur, WJ
    BIOINFORMATICS, 2004, 20 (14) : 2320 - 2321