Research on Chinese Classification Based on TF-IDF

被引:0
|
作者
Xiao, Liang [1 ]
Yao, Nianmin [1 ]
机构
[1] Dalian Neusoft Univ Informat, Dalian 116023, Peoples R China
关键词
text classification; machine learning; TF-IDF; word segmentation;
D O I
10.1117/12.2615301
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Chinese text classification has been in the research stage, there are many machine learning algorithms that can be used, such as logical regression, SVM, KNN, naive Bayes, random forest, neural network and so on. In this paper, taking Chinese modern novels as an example, we use various algorithms for classification and comparison, and choose the best algorithm for naive Bayes and neural network. After adjusting the TF-IDF algorithm and processing the participle according to the TF-IDF value, the accuracy of classification is improved obviously. The logistic regression with the lowest accuracy can increase about 6.7%,while the simple Bias and neural network can reach 100%.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Research of Text Classification Based on Improved TF-IDF Algorithm
    Liu, Cai-zhi
    Sheng, Yan-xiu
    Wei, Zhi-qiang
    Yang, Yong-Quan
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE OF INTELLIGENT ROBOTICS AND CONTROL ENGINEERING (IRCE), 2018, : 218 - 222
  • [2] A Code Classification Method Based on TF-IDF
    Wang, Ke
    Jiang, Jian-Hong
    Ma, Rui-Yun
    [J]. 2018 INTERNATIONAL CONFERENCE ON E-COMMERCE AND CONTEMPORARY ECONOMIC DEVELOPMENT (ECED 2018), 2018, : 13 - 17
  • [3] Research paper classification systems based on TF-IDF and LDA schemes
    Kim, Sang-Woon
    Gil, Joon-Min
    [J]. HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2019, 9 (01)
  • [4] A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN
    Mao, Teng
    Peng, Yuanyuan
    Hang, Yuru
    Zhang, Yangsen
    [J]. CHINESE LEXICAL SEMANTICS, CLSW 2018, 2018, 11173 : 509 - 518
  • [5] Research on case reasoning method based on TF-IDF
    Lin Zhang
    [J]. International Journal of System Assurance Engineering and Management, 2021, 12 : 608 - 615
  • [6] Research on case reasoning method based on TF-IDF
    Zhang, Lin
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2021, 12 (03) : 608 - 615
  • [7] Internet Articles Classification by Industry Types Based on TF-IDF
    Cha, Jonghun
    Lee, Jee-Hyong
    [J]. ADVANCES IN COMPUTER SCIENCE AND UBIQUITOUS COMPUTING, 2018, 474 : 1121 - 1125
  • [8] Research on Sentiment Classification for Tang Poetry based on TF-IDF and FP-Growth
    Li, Gang
    Li, Jie
    [J]. PROCEEDINGS OF 2018 IEEE 3RD ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC 2018), 2018, : 630 - 634
  • [9] An improved TF-IDF approach for text classification
    Zhang Yun-tao
    Gong Ling
    Wang Yong-cheng
    [J]. Journal of Zhejiang University-SCIENCE A, 2005, 6 (1): : 49 - 55
  • [10] Research on aviation unsafe incidents classification with improved TF-IDF algorithm
    Wang, Yanhua
    Zhang, Zhiyuan
    Huo, Weigang
    [J]. MODERN PHYSICS LETTERS B, 2016, 30 (12):