Exploiting concept clusters for content-based information retrieval

被引:28
|
作者
Kang, BY [1 ]
Kim, DW
Lee, SJ
机构
[1] Kyungpook Natl Univ, Dept Comp Engn, Taegu 702701, South Korea
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
关键词
information retrieval; indexing; term frequency; weighting function;
D O I
10.1016/j.ins.2004.03.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current approaches to index weighting for information retrieval from texts are based on statistical analysis of the texts' contents. A key shortcoming of these indexing schemes, which consider only the terms in a document, is that they cannot extract semantically exact indexes that represent the semantic content of a document. To address this issue, we proposed a new indexing formalism that considers not only the terms in a document, but also the concepts. In the proposed method, concepts are extracted by exploiting clusters of terms that are semantically related, referred to as concept clusters. Through experiments on the TREC-2 collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the highest-ranked documents. Moreover, the index term dimension was 53.3% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment. (C) 2004 Elsevier Inc. All rights reserved.
引用
收藏
页码:443 / 462
页数:20
相关论文
共 50 条
  • [1] Enhancing Medical Information Retrieval by Exploiting a Content-Based Recommender Method
    Li, Wei
    Jones, Gareth J. F.
    [J]. EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION, 2015, 9283 : 142 - 153
  • [2] Exploiting unlabeled data in content-based image retrieval
    Zhou, ZH
    Chen, KJ
    Jiang, Y
    [J]. MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 525 - 536
  • [3] Exploiting Ontology for Concept Based Information Retrieval
    Sharan, Aditi
    Joshi, Manju Lata
    Pandey, Anupama
    [J]. INFORMATION SYSTEMS FOR INDIAN LANGUAGES, 2011, 139 : 157 - 164
  • [4] Content-based indexing and retrieval of visual information
    Chang, SF
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 1997, 14 (04) : 45 - 48
  • [5] Content-based information retrieval and digital libraries
    Wan, Gary
    Liu, Zao
    [J]. INFORMATION TECHNOLOGY AND LIBRARIES, 2008, 27 (01) : 41 - 47
  • [6] Issues in content-based music information retrieval
    Lippincott, A
    [J]. JOURNAL OF INFORMATION SCIENCE, 2002, 28 (02) : 137 - 142
  • [7] Forming and searching content-based hierarchical agent clusters in distributed information retrieval systems
    Zhang, Haizheng
    Lesser, Victor
    [J]. Web Intelligence and Agent Systems, 2006, 4 (04): : 353 - 370
  • [8] Exploiting Evolutionary approaches for Content-Based Medical Image Retrieval
    Rocha, Reginaldo
    Saito, Priscila T. M.
    Bugatti, Pedro H.
    [J]. 2015 IEEE 28TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2015, : 370 - 371
  • [9] Combining concept- with content-based multimedia retrieval
    Windhouwer, M
    van Zwol, R
    [J]. INTELLIGENT SEARCH ON XML DATA: APPLICATIONS, LANGUAGES, MODELS IMPLEMENTATIONS AND BENCHMARKS, 2003, 2818 : 217 - 230
  • [10] Content-Based Image Retrieval: Concept and Current Practices
    Hiwale, Sushant Shrikant
    Dhotre, Dhanraj
    [J]. 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, SIGNALS, COMMUNICATION AND OPTIMIZATION (EESCO), 2015,