Network-Based Bag-of-Words Model for Text Classification

被引:28
|
作者
Yan, Dongyang [1 ]
Li, Keping [1 ]
Gu, Shuang [1 ]
Yang, Liu [1 ]
机构
[1] State Key Lab Rail Traff Control & Safety, Beijing 100044, Peoples R China
基金
北京市自然科学基金;
关键词
Bag-of-words; classification; complex network; text correlation; KNN; COMPLEX NETWORKS; TF-IDF; SIMILARITY; LANGUAGE;
D O I
10.1109/ACCESS.2020.2991074
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapidly developing internet and other media have produced a tremendous amount of text data, making it a challenging and valuable task to find a more effective way to analyze text data by machine. Text representation is the first step for a machine to understand the text, and the commonly used text representation method is the Bag-of-Words (BoW) model. To form the vector representation of a document, the BoW model separately matches and counts each element in the document, neglecting much correlation information among words. In this paper, we propose a network-based bag-of-words model, which collects high-level structural and semantic meaning of the words. Because the structural and semantic information of a network reflects the relationship between nodes, the proposed model can distinguish the relation of words. We apply the proposed model to text classification and compare the performance of the proposed model with different text representation methods on four document datasets. The results show that the proposed method achieves the best performance with high efficiency. Using the Eccentricity property of the network as features can get the highest accuracy. We also investigate the influence of different network structures in the proposed method. Experimental results reveal that, for text classification, the dynamic network is more suitable than the static network and the hybrid network.
引用
收藏
页码:82641 / 82652
页数:12
相关论文
共 50 条
  • [1] Internet Traffic Classification based on bag-of-words model
    Zhang, Yin
    Zhou, Yi
    Chen, Kai
    [J]. 2012 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2012, : 736 - 741
  • [2] Human Action Classification Based on Sequential Bag-of-Words Model
    Liu, Hong
    Zhang, Qiaoduo
    Sun, Qianru
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS IEEE-ROBIO 2014, 2014, : 2280 - 2285
  • [3] Visual Attention based Bag-of-Words Model for Image Classification
    Wang, Qiwei
    Wan, Shouhong
    Yue, Lihua
    Wang, Che
    [J]. 6TH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2014), 2014, 9159
  • [4] The influence of preprocessing on text classification using a bag-of-words representation
    HaCohen-Kerner, Yaakov
    Miller, Daniel
    Yigal, Yair
    [J]. PLOS ONE, 2020, 15 (05):
  • [5] Graph-based bag-of-words for classification
    Silva, Fernanda B.
    Werneck, Rafael de O.
    Goldenstein, Siome
    Tabbone, Salvatore
    Torres, Ricardo da S.
    [J]. PATTERN RECOGNITION, 2018, 74 : 266 - 285
  • [6] Do Important Words in Bag-of-Words Model of Text Relatedness Help?
    Islam, Aminul
    Milios, Evangelos
    Keselj, Vlado
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 569 - 577
  • [7] An Image Classification Method Based on Optimized Fuzzy Bag-of-words Model
    Li, Zilong
    Zhou, Yong
    Bao, Rong
    [J]. TRAITEMENT DU SIGNAL, 2019, 36 (03) : 239 - 244
  • [8] Image Classification with Bag-of-Words Model Based on Improved SIFT Algorithm
    Gao, Huilin
    Dou, Lihua
    Chen, Wenjie
    Sun, Jian
    [J]. 2013 9TH ASIAN CONTROL CONFERENCE (ASCC), 2013,
  • [9] Sequential Bag-of-Words model for human action classification
    Liu, Hong
    Tang, Hao
    Xiao, Wei
    Guo, ZiYi
    Tian, Lu
    Gao, Yuan
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2016, 1 (02) : 125 - 136
  • [10] Bag-of-Words Model for Image Classification Based on Harris Corner Features Weighting
    Sheng, Haidi
    Duan, Huichuan
    Kong, Chao
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 1284 - 1289