Local ensemble learning from imbalanced and noisy data for word sense disambiguation

被引:15
|
作者
Krawczyk, Bartosz [1 ]
McInnes, Bridget T. [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Comp Sci, Richmond, VA 23284 USA
关键词
Machine learning; Natural language processing; Imbalanced classification; Multi-class imbalance; Ensemble learning; One-class classification; Class label noise; Word sense disambiguation; SAMPLING APPROACH; CLASSIFICATION; ALGORITHMS;
D O I
10.1016/j.patcog.2017.10.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural Language Processing plays a key role in man-machine interactions, allowing computers to understand and analyze human language. One of its more challenging sub-domains is word sense disambiguation, the task of automatically identifying the intended sense (or concept) of an ambiguous word based on the context in which the word is used. This requires proper feature extraction to capture specific data properties and a dedicated machine learning solution to allow for the accurate labeling of the appropriate sense. However, the pattern classification problem posed here is highly challenging, as we must deal with high-dimensional and multi-class imbalanced data that additionally may be corrupted with class label noise. To address these issues, we propose a local ensemble learning solution. It uses a one-class decomposition of the multi-class problem, assigning an ensemble of one-class classifiers to each of the distributions. The classifiers are trained on the basis of low-dimensional subsets of features and a kernel feature space transformation to obtain a more compact representation. Instance weighting is used to filter out potentially noisy instances and reduce overlapping among classes. Finally, a two-level classifier fusion technique is used to reconstruct the original multi-class problem. Our results show that the proposed learning approach displays robustness to both multi-class skewed distributions and class label noise, making it a useful tool for the considered task. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:103 / 119
页数:17
相关论文
共 50 条
  • [41] An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation
    Lee, YK
    Ng, HT
    PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2002, : 41 - 48
  • [42] Word sense disambiguation for Punjabi language using deep learning techniques
    Singh, Varinder Pal
    Kumar, Parteek
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (08): : 2963 - 2973
  • [43] WordNet-based word sense disambiguation for learning user profiles
    Degemmis, M.
    Lops, P.
    Semeraro, G.
    SEMANTICS, WEB AND MINING, 2006, 4289 : 18 - 33
  • [44] Word sense disambiguation in untagged text based on term weight learning
    Fukurnoto, F
    Suzuki, Y
    NINTH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS, 1999, : 209 - 216
  • [45] Investigating the Feasibility of Deep Learning Methods for Urdu Word Sense Disambiguation
    Saeed, Ali
    Nawab, Rao Muhammad Adeel
    Stevenson, Mark
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (02)
  • [46] Reversal of the Word Sense Disambiguation Task Using a Deep Learning Model
    Laukaitis, Algirdas
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [47] Learning Well-Founded Ontologies through Word Sense Disambiguation
    Leao, Felipe
    Revoredo, Kate
    Baiao, Fernanda
    2013 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2013, : 195 - 200
  • [48] EnhancedBERT: A feature-rich ensemble model for Arabic word sense disambiguation with statistical analysis and optimized data collection
    Kaddoura, Sanaa
    Nassar, Reem
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (01)
  • [49] Word sense disambiguation for Punjabi language using deep learning techniques
    Varinder pal Singh
    Parteek Kumar
    Neural Computing and Applications, 2020, 32 : 2963 - 2973
  • [50] Investigating problems of semi-supervised learning for word sense disambiguation
    Le, Anh-Cuong
    Shimazu, Akira
    Nguyen, Le-Minh
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 482 - +