A novel term weighting scheme based on discrimination power obtained from past retrieval results

被引:16
|
作者
Song, Sa-kwang [1 ,2 ]
Myaeng, Sung Hyon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Div Web Sci & Technol, Taejon 305701, South Korea
[2] Korea Inst Sci & Technol Informat, Taejon 305806, South Korea
基金
新加坡国家研究基金会;
关键词
Term weighting; Evidential weight; Discrimination power; Language model; Probabilistic model;
D O I
10.1016/j.ipm.2012.03.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term's role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term's evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF*IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF*IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models. (C) 2012 Elsevier Ltd. All rights reserved.
引用
收藏
页码:921 / 932
页数:12
相关论文
共 50 条
  • [1] Term weighting for information retrieval based on term's discrimination power
    Li, Qing
    Lee, Seungwoo
    Jung, Hanmin
    Lee, Yeong Su
    Cho, Jae-Hyun
    Song, Sa-kwang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (02) : 769 - 781
  • [2] Term weighting for information retrieval based on term’s discrimination power
    Qing Li
    Seungwoo Lee
    Hanmin Jung
    Yeong Su Lee
    Jae-Hyun Cho
    Sa-kwang Song
    Multimedia Tools and Applications, 2014, 71 : 769 - 781
  • [3] Improving Information Retrieval Through a Global Term Weighting Scheme
    Cuellar, Daniel
    Diaz, Elva
    Ponce-de-Leon-Senti, Eunice
    PATTERN RECOGNITION (MCPR 2015), 2015, 9116 : 246 - 257
  • [4] A Part-Of-Speech term weighting scheme for biomedical information retrieval
    Wang, Yanshan
    Wu, Stephen
    Li, Dingcheng
    Mehrabi, Saeed
    Liu, Hongfang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 63 : 379 - 389
  • [5] A NOVEL TERM WEIGHTING SCHEME FOR A FUZZY LOGIC BASED INTELLIGENT WEB AGENT
    Gomez, Ariel
    Ropero, Jorge
    Leon, Carlos
    Carrasco, Alejandro
    ICEIS 2008: PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL AIDSS: ARTIFICIAL INTELLIGENCE AND DECISION SUPPORT SYSTEMS, 2008, : 496 - 499
  • [6] A NOVEL TERM WEIGHTING SCHEME MIDF FOR TEXT CATEGORIZATION
    Deisy, C.
    Gowri, M.
    Baskar, S.
    Kalaiarasi, S. M. A.
    Ramraj, N.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2010, 5 (01) : 94 - 107
  • [7] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    Informatica (Slovenia), 2022, 46 (02): : 259 - 268
  • [8] A novel term weighting scheme for automated text categorization
    Xu, Hongzhi
    Li, Chunping
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 759 - 764
  • [9] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (02): : 259 - 268
  • [10] Graph-based term weighting for information retrieval
    Blanco, Roi
    Lioma, Christina
    INFORMATION RETRIEVAL, 2012, 15 (01): : 54 - 92