Word sense discrimination in information retrieval: A spectral clustering-based approach

被引:17
|
作者
Chifu, Adrian-Gabriel [1 ]
Hristea, Florentina [2 ]
Mothe, Josiane [3 ]
Popescu, Marius [2 ]
机构
[1] Univ Toulouse 3, Univ Toulouse, CNRS, IRIT UMR5505, F-31062 Toulouse 9, France
[2] Univ Bucharest, Fac Math & Comp Sci, Dept Comp Sci, RO-010014 Bucharest, Romania
[3] Univ Toulouse, Ecole Super Professorat & Educ, CNRS, IRIT UMR5505, F-31062 Toulouse 9, France
关键词
Information retrieval; Word sense disambiguation; Word sense discrimination; Spectral clustering; High precision;
D O I
10.1016/j.ipm.2014.10.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Word sense ambiguity has been identified as a cause of poor precision in information retrieval (IR) systems. Word sense disambiguation and discrimination methods have been defined to help systems choose which documents should be retrieved in relation to an ambiguous query. However, the only approaches that show a genuine benefit for word sense discrimination or disambiguation in IR are generally supervised ones. In this paper we propose a new unsupervised method that uses word sense discrimination in IR. The method we develop is based on spectral clustering and reorders an initially retrieved document list by boosting documents that are semantically similar to the target query. For several TREC ad hoc collections we show that our method is useful in the case of queries which contain ambiguous terms. We are interested in improving the level of precision after 5, 10 and 30 retrieved documents (P@5, P@10, P@30) respectively. We show that precision can be improved by 8% above current state-of-the-art baselines. We also focus on poor performing queries. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:16 / 31
页数:16
相关论文
共 50 条
  • [31] A clustering-based approach to vortex extraction
    Deng, Liang
    Wang, Yueqing
    Chen, Cheng
    Liu, Yang
    Wang, Fang
    Liu, Jie
    [J]. JOURNAL OF VISUALIZATION, 2020, 23 (03) : 459 - 474
  • [32] ICN clustering-based approach for VANETs
    Fourati, Lamia Chaari
    Ayed, Samiha
    Ben Rejeb, Mohamed Ali
    [J]. ANNALS OF TELECOMMUNICATIONS, 2021, 76 (9-10) : 745 - 757
  • [33] ICN clustering-based approach for VANETs
    Lamia Chaari Fourati
    Samiha Ayed
    Mohamed Ali Ben Rejeb
    [J]. Annals of Telecommunications, 2021, 76 : 745 - 757
  • [34] A Clustering-Based Approach to Ontology Alignment
    Duan, Songyun
    Fokoue, Achille
    Srinivas, Kavitha
    Byrne, Brian
    [J]. SEMANTIC WEB - ISWC 2011, PT I, 2011, 7031 : 146 - +
  • [35] A clustering-based approach to vortex extraction
    Liang Deng
    Yueqing Wang
    Cheng Chen
    Yang Liu
    Fang Wang
    Jie Liu
    [J]. Journal of Visualization, 2020, 23 : 459 - 474
  • [36] Combining Probabilistic and Translation-Based Models for Information Retrieval Based on Word Sense Annotations
    Wolf, Elisabeth
    Bernhard, Delphine
    Gurevych, Iryna
    [J]. MULTILINGUAL INFORMATION ACCESS EVALUATION I: TEXT RETRIEVAL EXPERIMENTS, 2010, 6241 : 120 - +
  • [37] Word sense disambiguation for cross-language information retrieval
    Liu, MX
    Diamond, T
    Diekema, AR
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : B35 - B40
  • [38] TDSS: A New Word Sense Representation Framework for Information Retrieval
    Chen, Liwei
    Feng, Yansong
    Zhao, Dongyan
    [J]. NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 63 - 75
  • [39] Word clustering for collocation-based word sense disambiguation
    Jin, Peng
    Sun, Xu
    Wu, Yunfang
    Yu, Shiwen
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 267 - +
  • [40] Feature selection for spectral clustering: to help or not to help spectral clustering when performing sense discrimination for IR?
    Chifu, Adrian-Gabriel
    Hristea, Florentina
    [J]. OPEN COMPUTER SCIENCE, 2018, 8 (01): : 218 - 227