Research of Text Classification Based on Improved TF-IDF Algorithm

被引:0
|
作者
Liu, Cai-zhi [1 ]
Sheng, Yan-xiu [1 ]
Wei, Zhi-qiang [1 ]
Yang, Yong-Quan [1 ]
机构
[1] Ocean Univ China, Coll Informat Sci & Engn, Qingdao, Peoples R China
关键词
text classification; text representation; TF-IDF; Word2vec model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, with the rapid development of Internet Technology, text data is growing rapidly every day. Users need to filter out the information they need from a large amount of text. Therefore, automatic text classification technology can help users find information. In order to address problems, such as ignoring contextual semantic links and different vocabulary importance in traditional text classification techniques, this paper presents a vector representation of feature words based on the deep learning tool Word2vec, and the weight of the feature words is calculated by the improved TF-IDF algorithm. By multiplying the weight of the word and the word vector, the vector representation of the word is realized. Finally, each text is represented by accumulating all the word vectors. Thus, text classification is carried out.
引用
收藏
页码:218 / 222
页数:5
相关论文
共 50 条
  • [1] An improved TF-IDF approach for text classification
    Zhang Yun-tao
    Gong Ling
    Wang Yong-cheng
    Journal of Zhejiang University-SCIENCE A, 2005, 6 (1): : 49 - 55
  • [2] An improved TF-IDF approach for text classification
    张云涛
    龚玲
    王永成
    Journal of Zhejiang University Science A(Science in Engineering), 2005, (01) : 50 - 56
  • [3] Research on aviation unsafe incidents classification with improved TF-IDF algorithm
    Wang, Yanhua
    Zhang, Zhiyuan
    Huo, Weigang
    MODERN PHYSICS LETTERS B, 2016, 30 (12):
  • [4] Application of an Improved TF-IDF Method in Literary Text Classification
    Xiang, Lin
    ADVANCES IN MULTIMEDIA, 2022, 2022
  • [5] Research on Chinese Classification Based on TF-IDF
    Xiao, Liang
    Yao, Nianmin
    2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
  • [6] An Improved TF-IDF Algorithm Based on Class Discriminative Strength for Text Categorization on Desensitized Data
    Zhang, Ting
    Ge, Shuzhi Sam
    3RD INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2019), 2019, : 39 - 44
  • [7] A Code Classification Method Based on TF-IDF
    Wang, Ke
    Jiang, Jian-Hong
    Ma, Rui-Yun
    2018 INTERNATIONAL CONFERENCE ON E-COMMERCE AND CONTEMPORARY ECONOMIC DEVELOPMENT (ECED 2018), 2018, : 13 - 17
  • [8] A gene pathway enrichment method based on improved TF-IDF algorithm
    Xu, Shutan
    Leng, Yinhui
    Feng, Guofu
    Zhang, Chenjing
    Chen, Ming
    BIOCHEMISTRY AND BIOPHYSICS REPORTS, 2023, 34
  • [9] Topological Data Analysis In Text Classification Based On Word Embedding And TF-IDF
    Wen, Xiaoyang
    2020 3RD INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY (CISAT) 2020, 2020, 1634
  • [10] An Automatic Text Summary Extraction Method Based on Improved TextRank and TF-IDF
    Guan, Xinxin
    Li, Yeli
    Zeng, Qingtao
    Zhou, Chufeng
    2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRONIC MATERIALS, COMPUTERS AND MATERIALS ENGINEERING (AEMCME 2019), 2019, 563