Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval

被引:14
|
作者
Zhang, Hui [1 ]
Wang, Deqing [1 ]
Wu, Wenjun [1 ]
Hu, Hongping [1 ]
机构
[1] Beihang Univ, Sch Comp Sci, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
enterprise information retrieval; term weighting scheme; term frequency; function of document frequency; relevance ranking; PERFORMANCE;
D O I
10.1080/17517575.2012.665945
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In today's business environment, enterprises are increasingly under pressure to process the vast amount of data produced everyday within enterprises. One method is to focus on the business intelligence (BI) applications and increasing the commercial added-value through such business analytics activities. Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital task in enterprise Information Retrieval (IR), text categorisation, text analytics, etc. When determining term weight in a document, the traditional TF-IDF scheme sets weight value for the term considering only its occurrence frequency within the document and in the entire set of documents, which leads to some meaningful terms that cannot get the appropriate weight. In this article, we propose a new term weighting scheme called Term Frequency Function of Document Frequency (TF-FDF) to address this issue. Instead of using monotonically decreasing function such as Inverse Document Frequency, FDF presents a convex function that dynamically adjusts weights according to the significance of the words in a document set. This function can be manually tuned based on the distribution of the most meaningful words which semantically represent the document set. Our experiments show that the TF-FDF can achieve higher value of Normalised Discounted Cumulative Gain in IR than that of TF-IDF and its variants, and improving the accuracy of relevance ranking of the IR results.
引用
收藏
页码:433 / 444
页数:12
相关论文
共 50 条
  • [21] Graph-based term weighting for information retrieval
    Roi Blanco
    Christina Lioma
    Information Retrieval, 2012, 15 : 54 - 92
  • [22] On setting the hyper-parameters of term frequency normalization for information retrieval
    He, Ben
    Ounis, Iadh
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2007, 25 (03)
  • [23] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9545 - 9560
  • [24] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Turgut Dogan
    Alper Kursat Uysal
    Arabian Journal for Science and Engineering, 2019, 44 : 9545 - 9560
  • [25] Using modified term frequency to improve term weighting for text classification
    Chen, Long
    Jiang, Liangxiao
    Li, Chaoqun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 101
  • [26] A new term weighting scheme for text categorisation
    Barigou, Fatiha
    International Journal of Intelligent Systems Technologies and Applications, 2015, 14 (3-4) : 256 - 272
  • [27] Information-theoretic Term Weighting Schemes for Document Clustering
    Ke, Weimao
    JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 143 - 152
  • [28] Identifying Contextual Information in Document Classification using Term Weighting
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    PROCEEDINGS OF THE 2018 IEEE 8TH INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC 2018), 2018, : 72 - 78
  • [29] Identifying Contextual Information in Document Classification using Term Weighting
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    Proceedings of the 8th International Advance Computing Conference, IACC 2018, 2018, : 72 - 78
  • [30] Query Aspect Based Term Weighting Regularization in Information Retrieval
    Zheng, Wei
    Fang, Hui
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 344 - 356