Exploiting concept clusters for content-based information retrieval

被引:28
|
作者
Kang, BY [1 ]
Kim, DW
Lee, SJ
机构
[1] Kyungpook Natl Univ, Dept Comp Engn, Taegu 702701, South Korea
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
关键词
information retrieval; indexing; term frequency; weighting function;
D O I
10.1016/j.ins.2004.03.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current approaches to index weighting for information retrieval from texts are based on statistical analysis of the texts' contents. A key shortcoming of these indexing schemes, which consider only the terms in a document, is that they cannot extract semantically exact indexes that represent the semantic content of a document. To address this issue, we proposed a new indexing formalism that considers not only the terms in a document, but also the concepts. In the proposed method, concepts are extracted by exploiting clusters of terms that are semantically related, referred to as concept clusters. Through experiments on the TREC-2 collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the highest-ranked documents. Moreover, the index term dimension was 53.3% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:443 / 462
页数:20
相关论文
共 50 条
  • [21] TRENDS: A Content-Based Information Retrieval System for Designers
    Bouchard, Carole
    Omhover, Jean-Francois
    Mougenot, Celine
    Aoussat, Ameziane
    Westerman, Stephen J.
    [J]. DESIGN COMPUTING AND COGNITION '08, 2008, : 593 - +
  • [22] Content-based information retrieval by group theoretical methods
    Clausen, M
    Kurth, F
    [J]. COMPUTATIONAL NONCOMMUTATIVE ALGEBRA AND APPLICATIONS, 2004, 136 : 29 - 55
  • [23] Special issue on: Content-based visual information retrieval
    Vasconcelos, N
    Kunt, M
    [J]. SIGNAL PROCESSING, 2002, 82 (03)
  • [24] Advancing content-based image retrieval by exploiting image color and region features
    Gong, YH
    [J]. MULTIMEDIA SYSTEMS, 1999, 7 (06) : 449 - 457
  • [25] Exploiting the Hessian matrix for content-based retrieval of volume-data features
    J. Hladůvka
    E. Gröller
    [J]. The Visual Computer, 2002, 18 : 207 - 217
  • [26] A Multimodal Information Collector for Content-Based Image Retrieval System
    Zhang, He
    Sjoberg, Mats
    Laaksonen, Jorma
    Oja, Erkki
    [J]. NEURAL INFORMATION PROCESSING, PT III, 2011, 7064 : 737 - 746
  • [27] Relevance feature mapping for content-based multimedia information retrieval
    Zhou, Guang-Tong
    Ting, Kai Ming
    Liu, Fei Tony
    Yin, Yilong
    [J]. PATTERN RECOGNITION, 2012, 45 (04) : 1707 - 1720
  • [28] Content-based multimedia information retrieval: State of the art and challenges
    Lew, Michael S.
    Sebe, Nicu
    Djeraba, Chabane
    Jain, Ramesh
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2006, 2 (01) : 1 - 19
  • [29] Exploiting the Hessian matrix for content-based retrieval of volume-data features
    Hladuvka, J
    Gröller, E
    [J]. VISUAL COMPUTER, 2002, 18 (04): : 207 - 217
  • [30] Advancing content-based image retrieval by exploiting image color and region features
    Yihong Gong
    [J]. Multimedia Systems, 1999, 7 : 449 - 457