Evolving general term-weighting schemes for information retrieval: Tests on larger collections

被引:10
|
作者
Cummins, R [1 ]
O'riordan, C [1 ]
机构
[1] Natl Univ Ireland Univ Coll Galway, Dept Informat Technol, Galway, Ireland
关键词
genetic programming; information retrieval; term-weighting schemes;
D O I
10.1007/s10462-005-9001-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector space model is one such model in which the weights assigned to the document terms are of crucial importance to the accuracy of the retrieval system. This paper describes a genetic programming framework used to automatically determine term-weighting schemes that achieve a high average precision. These schemes are tested on standard test collections and are shown to perform as well as, and often better than, the modern BM25 weighting scheme. We present an analysis of the schemes evolved to explain the increase in performance. Furthermore, we show that the global (collection wide) part of the evolved weighting schemes also increases average precision over idf on larger TREC data. These global weighting schemes are shown to adhere to Luhn's resolving power as middle frequency terms are assigned the highest weight. However, the complete weighting schemes evolved on small collections do not perform as well on large collections. We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on large collections.
引用
收藏
页码:277 / 299
页数:23
相关论文
共 50 条
  • [21] The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization
    Sanchez-Gomez, Jesus M.
    Vega-Rodriguez, Miguel A.
    Perez, Carlos J.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
  • [22] Graph-based term weighting for information retrieval
    Blanco, Roi
    Lioma, Christina
    INFORMATION RETRIEVAL, 2012, 15 (01): : 54 - 92
  • [23] Part of Speech Based Term Weighting for Information Retrieval
    Lioma, Christina
    Blanco, Roi
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2009, 5478 : 412 - +
  • [24] Graph-based term weighting for information retrieval
    Roi Blanco
    Christina Lioma
    Information Retrieval, 2012, 15 : 54 - 92
  • [25] Improve precategorized collection retrieval by using supervised term weighting schemes
    Zhao, Y
    Karypis, G
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, PROCEEDINGS, 2002, : 16 - 21
  • [26] Term Weighting Schemes Experiment Based on SVD for Malay Text Retrieval
    Ab Samat, Nordianah
    Murad, Masrah Azrifah Azmi
    Abdullah, Muhamad Taufik
    Atan, Rodziah
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (10): : 357 - 361
  • [27] TERM WEIGHTING IN INFORMATION-RETRIEVAL USING THE TERM PRECISION MODEL
    YU, CT
    LAM, K
    SALTON, G
    JOURNAL OF THE ACM, 1982, 29 (01) : 152 - 170
  • [28] Term weighting for information retrieval based on term's discrimination power
    Li, Qing
    Lee, Seungwoo
    Jung, Hanmin
    Lee, Yeong Su
    Cho, Jae-Hyun
    Song, Sa-kwang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (02) : 769 - 781
  • [29] Term weighting for information retrieval based on term’s discrimination power
    Qing Li
    Seungwoo Lee
    Hanmin Jung
    Yeong Su Lee
    Jae-Hyun Cho
    Sa-kwang Song
    Multimedia Tools and Applications, 2014, 71 : 769 - 781
  • [30] Information-theoretic Term Weighting Schemes for Document Clustering
    Ke, Weimao
    JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 143 - 152