Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

被引:416
|
作者
Toutanova, K [1 ]
Manning, CD [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA | 2000年
关键词
D O I
10.3115/1117794.1117802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
引用
收藏
页码:63 / 70
页数:8
相关论文
共 50 条
  • [41] A Hybrid Approach to the Development of Part-of-Speech Tagger for Kafi-noonoo Text
    Mekuria, Zelalem
    Assabie, Yaregal
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PT I, 2014, 8403 : 214 - 224
  • [42] Fine-Grain Morphological Analyzer and Part-of-Speech Tagger for Arabic Text
    Sawalha, Majdi
    Atwell, Eric
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1258 - 1265
  • [43] Advanced Naive Bayes Algorithm Design with Part-of-Speech Tagger on Sentiment Analysis
    Wang, Yige
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 1377 - 1380
  • [44] Morphology Analysis for Hidden Markov Model based Indonesian Part-of-Speech Tagger
    Muljono
    Afini, Umriya
    Supriyanto, Catur
    2017 1ST INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS), 2017, : 237 - 240
  • [45] Investigation of Viterbi Algorithm Performance on Part-of-Speech Tagger of Natural Language Processing
    Liu, Yue
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 1430 - 1433
  • [46] A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages
    Cardenas, Ronald
    Lin, Ying
    Ji, Heng
    May, Jonathan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2428 - 2439
  • [47] Evaluation of Seven Part-of-Speech Taggers in Tagging Building Codes: Identifying the Best Performing Tagger and Common Sources of Errors
    Xue, Xiaorui
    Zhang, Jiansong
    CONSTRUCTION RESEARCH CONGRESS 2020: COMPUTER APPLICATIONS, 2020, : 498 - 507
  • [48] Use of a genetic algorithm in Brill's transformation-based part-of-speech tagger
    Wilson, Garnett
    Heywood, Malcolm
    GECCO 2005: Genetic and Evolutionary Computation Conference, Vols 1 and 2, 2005, : 2067 - 2073
  • [49] Part-of-speech tagger for Ainu language based on higher order Hidden Markov Model
    Ptaszynski, Michal
    Momouchi, Yoshio
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (14) : 11576 - 11582
  • [50] An efficient part-of-speech tagger rule-based approach of Sanskrit language analysis
    Tapaswi N.
    International Journal of Information Technology, 2024, 16 (2) : 901 - 908