Query-Sensitive Similarity Measures for Information Retrieval

被引:0
|
作者
Anastasios Tombros
C.J. van Rijsbergen
机构
[1] University of Glasgow,Department of Computing Science
来源
关键词
Information retrieval; Document clustering; Similarity measures; Nearest neighbor searching;
D O I
暂无
中图分类号
学科分类号
摘要
The application of document clustering to information retrieval has been motivated by the potential effectiveness gains postulated by the cluster hypothesis. The hypothesis states that relevant documents tend to be highly similar to each other and therefore tend to appear in the same clusters. In this paper we propose an axiomatic view of the hypothesis by suggesting that documents relevant to the same query (co-relevant documents) display an inherent similarity to each other that is dictated by the query itself. Because of this inherent similarity, the cluster hypothesis should be valid for any document collection. Our research describes an attempt to devise means by which this similarity can be detected. We propose the use of query-sensitive similarity measures that bias interdocument relationships toward pairs of documents that jointly possess attributes expressed in a query. We experimentally tested three query-sensitive measures against conventional ones that do not take the query into account, and we also examined the comparative effectiveness of the three query-sensitive measures. We calculated interdocument relationships for varying numbers of top-ranked documents for six document collections. Our results show a consistent and significant increase in the number of relevant documents that become nearest neighbors of any given relevant document when query-sensitive measures are used. These results suggest that the effectiveness of a cluster-based information retrieval system has the potential to increase through the use of query-sensitive similarity measures.
引用
收藏
页码:617 / 642
页数:25
相关论文
共 50 条
  • [21] Enhancing query relevance: leveraging SBERT and cosine similarity for optimal information retrieval
    Venkatesh Sharma, K.
    Ayiluri, Pramod Reddy
    Betala, Rakesh
    Jagdish Kumar, P.
    Shirisha Reddy, K.
    [J]. International Journal of Speech Technology, 2024, 27 (03) : 753 - 763
  • [22] Query-sensitive distance measure selection for time series nearest neighbor classification
    Kotsifakos, Alexios
    Athitsos, Vassilis
    Papapetrou, Panagiotis
    [J]. INTELLIGENT DATA ANALYSIS, 2016, 20 (01) : 5 - 27
  • [23] Passage Extraction using Subsequence-based Query-Sensitive Maximum Cut
    Chen, Xi
    Chen, Shihong
    Wang, Weiming
    [J]. KAM: 2008 INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING, PROCEEDINGS, 2008, : 221 - 225
  • [24] A Comparative Analysis of Music Similarity Measures in Music Information Retrieval Systems
    Gurjar, Kuldeep
    Moon, Yang-Sae
    [J]. JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2018, 14 (01): : 32 - 55
  • [25] Experimental Evaluation of Basic Similarity Measures and their Application in Visual Information Retrieval
    Marinov, Miroslav
    Kalmukov, Yordan
    Valova, Irena
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 30 - 35
  • [26] Similarity in Information Retrieval
    Martinovic, Jan
    Gajdos, Petr
    Snasel, Vaclav
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT APPLICATIONS, PROCEEDINGS, 2008, : 145 - 150
  • [27] Term similarity-based query expansion for cross-language information retrieval
    Adriani, M
    van Rijsbergen, CJ
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, PROCEEDINGS, 1999, 1696 : 311 - 322
  • [28] Context query in information retrieval
    Chi, CH
    Chen, D
    Lam, KY
    [J]. 14TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, : 101 - 106
  • [29] Conceptualized query for information retrieval
    Chen, Yan-Chen
    Sekiya, Hiroshi
    Takagi, Tomohiro
    [J]. NAFIPS 2007 - 2007 ANNUAL MEETING OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY, 2007, : 84 - +
  • [30] Similarity distances evaluation for query by example retrieval
    Da Rugna, J
    Konik, H
    [J]. INTERNET IMAGING IV, 2003, 5018 : 304 - 315