Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

被引:416
|
作者
Toutanova, K [1 ]
Manning, CD [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
PROCEEDINGS OF THE 2000 JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA | 2000年
关键词
D O I
10.3115/1117794.1117802
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents results for a maximum-entropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
引用
收藏
页码:63 / 70
页数:8
相关论文
共 50 条
  • [21] Adding Morphological Information to a Connectionist Part-Of-Speech Tagger
    Zamora-Martinez, Francisco
    Jose Castro-Bleda, Maria
    Espana-Boquera, Salvador
    Tortajada-Velert, Salvador
    CURRENT TOPICS IN ARTIFICIAL INTELLIGENCE, 2010, 5988 : 191 - +
  • [22] Building an Indonesian Rule-Based Part-of-Speech Tagger
    Rashel, Fam
    Luthfi, Andry
    Dinakaramani, Arawinda
    Manurung, Ruli
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 70 - 73
  • [23] Bayesian reinforcement for a probabilistic neural net Part-of-Speech tagger
    Maragoudakis, M
    Ganchev, T
    Fakotakis, N
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 137 - 145
  • [24] Development of a multilingual parallel corpus and a part-of-speech tagger for Afrikaans
    Trushkina, Julia
    Intelligent Information Processing III, 2006, 228 : 453 - 462
  • [25] A Supervised Part-Of-Speech Tagger for the Greek Language of the Social Web
    Nikiforos, Maria Nefeli
    Kermanidis, Katia Lida
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3861 - 3867
  • [26] Arabic part-of-speech tagger based support vectors machines
    Yousif, Jabar Hassan
    Sembok, Tengku Mohd Tengku
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 2084 - +
  • [27] Choosing a Spanish Part-of-Speech tagger for a lexically sensitive task
    Escartin, Carla Parra
    Alonso, Hector Martinez
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2015, (54): : 29 - 36
  • [28] Detecting Syntactic Change Using a Neural Part-of-Speech Tagger
    Merrill, William
    Stark, Gigi Felice
    Frank, Robert
    1ST INTERNATIONAL WORKSHOP ON COMPUTATIONAL APPROACHES TO HISTORICAL LANGUAGE CHANGE, 2019, : 167 - 174
  • [29] Part-of-speech tagger for Bodo language using deep learning approach
    Pathak, Dhrubajyoti
    Narzary, Sanjib
    Nandi, Sukumar
    Som, Bidisha
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 215 - 229
  • [30] SoMeWeTa: A Part-of-Speech Tagger for German Social Media and Web Texts
    Proisl, Thomas
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 665 - 670