Topological Data Analysis In Text Classification Based On Word Embedding And TF-IDF

被引:0
|
作者
Wen, Xiaoyang [1 ]
机构
[1] Beijing Normal Univ, Expt High Sch, Beijing, Peoples R China
关键词
D O I
10.1088/1742-6596/1634/1/012039
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As a fresh and rapidly-developing method in data science, topological data analysis (TDA) offers a new set of ways to look at data and derive features out of high-dimensional models with topological and geometric tools. In this paper, the author briefly introduces the topological concepts that are involved several researches, then compares and examines different methods of extraction of topological features from the texts. The result shows that these topological tools provide some additional features of the document that are not detected by using the original methods. In the experiment, adding these topological features to the usual text mining tools results in improvement of prediction accuracy (as much as 5%). However, as expected, these topological features alone are not sufficient to classify text documents. Future experiments and discussions need to be conducted to determine whether these methods could be combined to make better classifications.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] An improved TF-IDF approach for text classification
    Zhang Yun-tao
    Gong Ling
    Wang Yong-cheng
    [J]. Journal of Zhejiang University-SCIENCE A, 2005, 6 (1): : 49 - 55
  • [2] An improved TF-IDF approach for text classification
    张云涛
    龚玲
    王永成
    [J]. Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2005, (01) : 50 - 56
  • [3] Research of Text Classification Based on Improved TF-IDF Algorithm
    Liu, Cai-zhi
    Sheng, Yan-xiu
    Wei, Zhi-qiang
    Yang, Yong-Quan
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE OF INTELLIGENT ROBOTICS AND CONTROL ENGINEERING (IRCE), 2018, : 218 - 222
  • [4] Emotion Analysis in Text using TF-IDF
    Sundaram, Varun
    Ahmed, Saad
    Muqtadeer, Shaik Abdul
    Reddy, R. Ravinder
    [J]. 2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 292 - 297
  • [5] A Code Classification Method Based on TF-IDF
    Wang, Ke
    Jiang, Jian-Hong
    Ma, Rui-Yun
    [J]. 2018 INTERNATIONAL CONFERENCE ON E-COMMERCE AND CONTEMPORARY ECONOMIC DEVELOPMENT (ECED 2018), 2018, : 13 - 17
  • [6] Application of an Improved TF-IDF Method in Literary Text Classification
    Xiang, Lin
    [J]. ADVANCES IN MULTIMEDIA, 2022, 2022
  • [7] A Classification Method for Chinese Word Semantic Relations Based on TF-IDF and CNN
    Mao, Teng
    Peng, Yuanyuan
    Hang, Yuru
    Zhang, Yangsen
    [J]. CHINESE LEXICAL SEMANTICS, CLSW 2018, 2018, 11173 : 509 - 518
  • [8] Research on Chinese Classification Based on TF-IDF
    Xiao, Liang
    Yao, Nianmin
    [J]. 2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
  • [9] KNN with TF-IDF Based Framework for Text Categorization
    Trstenjak, Bruno
    Mikac, Sasa
    Donko, Dzenana
    [J]. 24TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2013, 2014, 69 : 1356 - 1364
  • [10] Turning from TF-IDF to TF-IGM for term weighting in text classification
    Chen, Kewen
    Zhang, Zuping
    Long, Jun
    Zhang, Hao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 66 : 245 - 260