Unsupervised word sense disambiguation with N-gram features

被引:10
|
作者
Preotiuc-Pietro, Daniel [1 ]
Hristea, Florentina [2 ]
机构
[1] Univ Sheffield, Dept Comp Sci, Sheffield S1 4DP, S Yorkshire, England
[2] Univ Bucharest, Dept Comp Sci, Bucharest 010014, Romania
关键词
Bayesian classification; The EM algorithm; Word sense disambiguation; Unsupervised disambiguation; Web-scale N-grams;
D O I
10.1007/s10462-011-9306-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Na < ve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are "helping" a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature selection is a reliable alternative to previously existing approaches, provided that a "quality list" of features, adapted to the part of speech, is used.
引用
收藏
页码:241 / 260
页数:20
相关论文
共 50 条
  • [41] State of the art versus classical clustering for unsupervised word sense disambiguation
    Marius Popescu
    Florentina Hristea
    [J]. Artificial Intelligence Review, 2011, 35 : 241 - 264
  • [42] An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
    Ustalov, Dmitry
    Teslenko, Denis
    Panchenko, Alexander
    Chernoskutov, Mikhail
    Biemann, Chris
    Ponzetto, Simone Paolo
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1018 - 1022
  • [43] Selecting Training Data for Unsupervised Domain Adaptation in Word Sense Disambiguation
    Komiya, Kanako
    Sasaki, Minoru
    Shinnou, Hiroyuki
    Kotani, Yoshiyuki
    Okumura, Manabu
    [J]. PRICAI 2016: TRENDS IN ARTIFICIAL INTELLIGENCE, 2016, 9810 : 220 - 232
  • [44] Combining supervised and unsupervised lexical knowledge methods for word sense disambiguation
    Agirre, E
    Rigau, G
    Padró, L
    Atserias, J
    [J]. COMPUTERS AND THE HUMANITIES, 2000, 34 (1-2): : 103 - 108
  • [45] An Experimental Study on Unsupervised Graph-based Word Sense Disambiguation
    Tsatsaronis, George
    Varlamis, Iraklis
    Norvag, Kjetil
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 184 - +
  • [46] Syntactic and semantic disambiguation of numeral strings using an n-gram method
    Min, KH
    Wilson, WH
    Moon, YJ
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 82 - 91
  • [47] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. Lect. Notes Comput. Sci, 1600, (557-565):
  • [48] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 557 - +
  • [49] Bag-Of-Word normalized n-gram models
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1594 - 1597
  • [50] Polish Word Recognition Based on n-Gram Methods
    Wojcicki, Piotr
    Zientarski, Tomasz
    [J]. IEEE ACCESS, 2024, 12 : 49817 - 49825