Adaptive document clustering based on query-based similarity

被引:6
|
作者
Na, Seung-Hoon [1 ]
Kang, In-Su [1 ]
Lee, Jong-Hyeok [1 ]
机构
[1] Pohang Univ Sci & Technol, Div Elect & Comp Engn, Pohang 790784, South Korea
关键词
adaptive document clustering; query-based similarity; cluster-based retrieval; language modeling approach;
D O I
10.1016/j.ipm.2006.08.008
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user's query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:887 / 901
页数:15
相关论文
共 50 条
  • [1] Improving query-based summarization using document graphs
    Mohamed, Ahmed A.
    Rajasekaran, Sanguthevar
    [J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY, VOLS 1 AND 2, 2006, : 408 - +
  • [2] Query-based inter-document similarity using probabilistic co-relevance model
    Na, Seung-Hoon
    Kang, In-Su
    Lee, Jong-Hyeok
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 684 - +
  • [3] Query-Based Metrics for Evaluating and Comparing Document Schemas
    Kuszera, Evandro Miguel
    Peres, Leticia M.
    Del Fabro, Marcos Didonet
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2020, 2020, 12127 : 530 - 545
  • [4] QUBIC: An adaptive approach to query-based recommendation
    Li, Lin
    Zhong, Luo
    Yang, Zhenglu
    Kitsuregawa, Masaru
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2013, 40 (03) : 555 - 587
  • [5] Adaptive query-based sampling of distributed collections
    Baillie, Mark
    Azzopardi, Leif
    Crestani, Fabio
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2006, 4209 : 316 - 328
  • [6] QUBiC: An adaptive approach to query-based recommendation
    Lin Li
    Luo Zhong
    Zhenglu Yang
    Masaru Kitsuregawa
    [J]. Journal of Intelligent Information Systems, 2013, 40 : 555 - 587
  • [7] Document Clustering Based on Fuzzy Similarity
    Zhou, Jingli
    Nie, Xuejun
    Qin, Leihua
    Zhu, Jianfeng
    [J]. APPLIED MECHANICS AND MECHANICAL ENGINEERING, PTS 1-3, 2010, 29-32 : 2620 - 2626
  • [8] Query-based Multi-document Summarization using Non-negative Semantic Feature and NMF Clustering
    Park, Sun
    Cha, ByungRae
    [J]. NCM 2008: 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 2, PROCEEDINGS, 2008, : 609 - 614
  • [9] Gene clustering by using query-based self-organizing maps
    Chang, Ray-I
    Chu, Chih-Chun
    Wu, Yu-Ying
    Chen, Yen-Liang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (09) : 6689 - 6694
  • [10] A query-based quantum eigensolver
    Jin, Shan
    Wu, Shaojun
    Zhou, Guanyu
    Li, Ying
    Li, Lvzhou
    Li, Bo
    Wang, Xiaoting
    [J]. Quantum Engineering, 2020, 2 (03)