Optimized TF-IDF Algorithm with the Adaptive Weight of Position of Word

被引:0
|
作者
Chen, Jie [1 ]
Chen, Cai [1 ]
Liang, Yi [1 ]
机构
[1] Beijing Univ Technol Beijing, Fac Informat Technol, Beijing, Peoples R China
关键词
text feature extraction; adaptive weight; weight of position; Term Frequency-Inverse Document Frequency(TF-IDF);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The classical TF-IDF algorithm only considers the weight of the term frequency and the inverse document frequency, without considering the weights of other feature of word. After the author analyzing summary of Chinese expression habits, an adaptive weight of position of word algorithm based on TF-IDF is proposed in this paper, which can be called TF-IDF-AP algorithm. The TF-IDF-AP algorithm can dynamically determine the weight of position of word according to the position of word. This paper introduced the vector space model (VSM) and designed comparative experiment under the scene of Chinese document clustering. The results show that the F-measure of TF-IDF-AP algorithm has been improved by 12.9% comparing with the classical TF-IDF algorithm.
引用
收藏
页码:114 / 117
页数:4
相关论文
共 50 条
  • [1] An improvement to TF-IDF: Term distribution based term weight algorithm
    Xia T.
    Chai Y.
    Journal of Software, 2011, 6 (03) : 413 - 420
  • [2] News Text Topic Clustering Optimized Method Based on TF-IDF Algorithm on Spark
    Zhou, Zhuo
    Qin, Jiaohua
    Xiang, Xuyu
    Tan, Yun
    Liu, Qiang
    Xiong, Neal N.
    CMC-COMPUTERS MATERIALS & CONTINUA, 2020, 62 (01): : 217 - 231
  • [3] Micro-blog Commercial Word Extraction Based On Improved TF-IDF Algorithm
    Huang, Xing
    Wu, Qing
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [4] Improvement of TF-IDF Algorithm Based on Knowledge Graph
    Wang, Yanpeng
    Zhang, Dehai
    Yuan, Ye
    Liu, Qing
    Yang, Yun
    2018 IEEE/ACIS 16TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATION (SERA), 2018, : 19 - 24
  • [5] An Improved TF-IDF algorithm based on word frequency distribution information and category distribution information
    Wu, Haoying
    Yuan, Na
    ICIIP'18: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING, 2018, : 211 - 215
  • [6] Unsupervised sentence representations as word information series: Revisiting TF-IDF
    Arroyo-Fernandez, Ignacio
    Mendez-Cruz, Carlos-Francisco
    Sierra, Gerardo
    Torres-Moreno, Juan-Manuel
    Sidorov, Grigori
    COMPUTER SPEECH AND LANGUAGE, 2019, 56 : 107 - 129
  • [7] Authorship Clustering using TF-IDF weighted Word-Embeddings
    Agarwal, Lucky
    Thakral, Kartik
    Bhatt, Gaurav
    Mittal, Ankush
    PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 24 - 29
  • [8] TF-IDF based loop closure detection algorithm for SLAM
    Dong R.
    Liu C.
    Yang G.
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2019, 49 (02): : 251 - 258
  • [9] Writer Identification using TF-IDF for Cursive Handwritten Word Recognition
    Bui, Quang Anh
    Visani, Muriel
    Prum, Sophea
    Ogier, Jean-Marc
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 844 - 848
  • [10] Improvement and Application of TF-IDF Algorithm in Text Orientation Analysis
    Wang, Wei
    Tang, Yongxin
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ADVANCED MATERIALS SCIENCE AND ENVIRONMENTAL ENGINEERING, 2016, 52 : 230 - 233