Word sense discrimination in information retrieval: A spectral clustering-based approach

被引:17
|
作者
Chifu, Adrian-Gabriel [1 ]
Hristea, Florentina [2 ]
Mothe, Josiane [3 ]
Popescu, Marius [2 ]
机构
[1] Univ Toulouse 3, Univ Toulouse, CNRS, IRIT UMR5505, F-31062 Toulouse 9, France
[2] Univ Bucharest, Fac Math & Comp Sci, Dept Comp Sci, RO-010014 Bucharest, Romania
[3] Univ Toulouse, Ecole Super Professorat & Educ, CNRS, IRIT UMR5505, F-31062 Toulouse 9, France
关键词
Information retrieval; Word sense disambiguation; Word sense discrimination; Spectral clustering; High precision;
D O I
10.1016/j.ipm.2014.10.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved document list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:16 / 31
页数:16
相关论文
共 50 条
  • [1] A clustering-based Approach for Unsupervised Word Sense Disambiguation
    Martin-Wanton, Tamara
    Berlanga-Llavori, Rafael
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (49): : 49 - 56
  • [2] Clustering-based fusion for medical information retrieval
    Xu, Qiuyu
    Huang, Yidong
    Wu, Shengli
    Nugent, Chris
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 135
  • [3] Information Retrieval Based on Word Semantic Clustering
    Chang, Chia-Yang
    Lin, Yan-Ting
    Lee, Shie-Jue
    Lai, Chih-Chin
    [J]. 2018 11TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2018), 2018,
  • [4] Word sense disambiguation for Information Retrieval
    Uzuner, O
    Katz, B
    Yuret, D
    [J]. SIXTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-99)/ELEVENTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI-99), 1999, : 985 - 985
  • [5] Spectral Clustering-based Classification
    Owhadi-Kareshk, Moein
    Akbarzadeh-T, Mohammad-R
    [J]. 2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2015, : 222 - 227
  • [6] Word sense disambiguation based on word sense clustering
    Anaya-Sanchez, Henry
    Pons-Porrata, Aurora
    Berlanga-Llavori, Rafael
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA-SBIA 2006, PROCEEDINGS, 2006, 4140 : 472 - 481
  • [7] Word Sense Disambiguation based on IDF applied to Information Retrieval
    Perea-Ortega, Jose M.
    Martinez-Santiago, Fernando
    Garcia-Cumbreras, Miguel A.
    Montejo-Raez, Arturo
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (46): : 99 - 106
  • [8] Analysis of Word Sense Disambiguation-Based Information Retrieval
    Guyot, Jacques
    Falquet, Gilles
    Radhouani, Said
    Benzineb, Karim
    [J]. EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 146 - 154
  • [9] Modularized Design of ACDCD: An Improved Spectral Clustering-Based Approach
    Bi, Qiu-Ping
    Li, Yu-Cheng
    Li, Rong
    Shen, Cheng
    Lou, Huan-Zhi
    Zhang, Yuan-Yuan
    [J]. SUSTAINABILITY, 2022, 14 (03)
  • [10] Joint Image and Word Sense Discrimination for Image Retrieval
    Lucchi, Aurelien
    Weston, Jason
    [J]. COMPUTER VISION - ECCV 2012, PT I, 2012, 7572 : 130 - 143