Tolerance Rough Set-Based Bag-of-Words Model for Document Representation

被引:0
|
作者
Dong Qiu
Haihuan Jiang
Ruiteng Yan
机构
[1] Chongqing University of Posts and Telecommunications,College of Science
[2] Chongqing University of Posts and Telecommunications,School of Computer Science and Technology
关键词
Document representation; Tolerance rough set; Bag-of-Words;
D O I
暂无
中图分类号
学科分类号
摘要
Document representation is one of the foundations of natural language processing. The bag-of-words (BoW) model, as the representative of document representation models, is a method with the properties of simplicity and validity. However, the traditional BoW model has the drawbacks of sparsity and lacking of latent semantic relations. In this paper, to solve these mentioned problems, we propose two tolerance rough set-based BOW models, called as TRBoW1 and TRBoW2 according to different weight calculation methods. Different from the popular representation methods of supervision, they are unsupervised and no prior knowledge required. Extending each document to its upper approximation with TRBoW1 or TRBoW2, the semantic relations among documents are mined and document vectors become denser. Comparative experiments on various document representation methods for text classification on different datasets have verified optimal performance of our methods.
引用
收藏
页码:1218 / 1226
页数:8
相关论文
共 50 条
  • [1] Tolerance Rough Set-Based Bag-of-Words Model for Document Representation
    Qiu, Dong
    Jiang, Haihuan
    Yan, Ruiteng
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 1218 - 1226
  • [2] Fuzzy Bag-of-Words Model for Document Representation
    Zhao, Rui
    Mao, Kezhi
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2018, 26 (02) : 794 - 804
  • [3] A novel hierarchical Bag-of-Words model for compact action representation
    Sun, Qianru
    Liu, Hong
    Ma, Liqian
    Zhang, Tianwei
    [J]. NEUROCOMPUTING, 2016, 174 : 722 - 732
  • [4] Multi-Document Summarization using Distributed Bag-of-Words Model
    Mani, Kaustubh
    Verma, Ishan
    Meisheri, Hardik
    Dey, Lipika
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 672 - 675
  • [5] Internet Traffic Classification based on bag-of-words model
    Zhang, Yin
    Zhou, Yi
    Chen, Kai
    [J]. 2012 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2012, : 736 - 741
  • [6] Nonhierarchical document clustering based on a tolerance rough set model
    Ho, TB
    Nguyen, NB
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2002, 17 (02) : 199 - 212
  • [7] Hierarchical Document Clustering Based on Tolerance Rough Set Model
    Kawasaki, Saori
    Nguyen, Ngoc Binh
    Ho, Tu Bao
    [J]. LECTURE NOTES IN COMPUTER SCIENCE <D>, 2000, 1910 : 458 - 463
  • [8] Underwater Image Sparse Representation based on Bag-of-Words and Compressed Sensing
    Shi, Congcong
    Nian, Rui
    He, Bo
    Shen, Yue
    Lendasse, Amaury
    Yan, Tianhong
    [J]. OCEANS 2015 - MTS/IEEE WASHINGTON, 2015,
  • [9] A Modified Bag-of-Words Representation for Industrial Alarm Floods
    Alinezhad, Haniyeh Seyed
    Shang, Jun
    Chen, Tongwen
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON ADVANCED CONTROL OF INDUSTRIAL PROCESSES (ADCONIP 2022), 2022, : 331 - 336
  • [10] Bag-of-words representation for biomedical time series classification
    Wang, Jin
    Liu, Ping
    She, Mary F. H.
    Nahavandi, Saeid
    Kouzani, Abbas
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2013, 8 (06) : 634 - 644