Supervised Contrast Learning Text Classification Model Based on DataQuality Augmentation

被引:0
|
作者
Wu, Liang [1 ]
Zhang, Fangfang [1 ]
Cheng, Chao [1 ]
Song, Shinan [1 ]
机构
[1] Changchun Univ Technol, Sch Comp Sci & Engn, Changchun 130012, Peoples R China
关键词
Text augmentation; data quality; text classification; contrast learning;
D O I
10.1145/3653300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Token-level data augmentation generates text samples by modifying the words of the sentences. However, data that are not easily classified can negatively affect the model. In particular, not considering the role of keywords when performing random augmentation operations on samples may lead to the generation of low-quality supplementary samples. Therefore, we propose a supervised contrast learning text classification model based on data quality augmentation. First, dynamic training is used to screen high-quality datasets containing beneficial information for model training. The selected data is then augmented with data based on important words with tag information. To obtain a better text representation to serve the downstream classification task, we employ a standard supervised contrast loss to train the model. Finally, we conduct experiments on five text classification datasets to validate the effectiveness of our model. In addition, ablation experiments are conducted to verify the impact of each module on classification.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] An Extension of the Aspect PLSA Model to Active and Semi-Supervised Learning for Text Classification
    Krithara, Anastasia
    Amini, Massih-Reza
    Goutte, Cyril
    Renders, Jean-Michel
    [J]. ARTIFICIAL INTELLIGENCE: THEORIES, MODELS AND APPLICATIONS, PROCEEDINGS, 2010, 6040 : 183 - +
  • [32] Improving Text Classification with Large Language Model-Based Data Augmentation
    Zhao, Huanhuan
    Chen, Haihua
    Ruggles, Thomas A.
    Feng, Yunhe
    Singh, Debjani
    Yoon, Hong-Jun
    [J]. ELECTRONICS, 2024, 13 (13)
  • [33] MPCNN with Knowledge Augmentation: A Model for Chinese Text Classification
    Zhang, Xiaozeng
    Fang, Ailian
    [J]. INTELLIGENT COMPUTING METHODOLOGIES, PT III, 2022, 13395 : 141 - 149
  • [34] Rough set and ensemble learning based semi-supervised algorithm for text classification
    Shi, Lei
    Ma, Xinming
    Xi, Lei
    Duan, Qiguo
    Zhao, Jingying
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 6300 - 6306
  • [35] A New SVM Method for Short Text Classification Based on Semi-Supervised Learning
    Yin, Chunyong
    Xiang, Jun
    Zhang, Hui
    Wang, Jin
    Yin, Zhichao
    Kim, Jeong-Uk
    [J]. 2015 4TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION TECHNOLOGY AND SENSOR APPLICATION (AITS), 2015, : 100 - 103
  • [36] Data Augmentation and Semi-supervised Learning for Deep Neural Networks-based Text Classifier
    Shim, Heereen
    Luca, Stijn
    Lowet, Dietwig
    Vanrumste, Bart
    [J]. PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1119 - 1126
  • [37] Imbalanced Classification Algorithm for Semi Supervised Text Learning (iCASSTLE)
    Banerjee, Debanjana
    Prabhat, Gyan
    Bhowal, Riyanka
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1012 - 1016
  • [38] Text Message Classification Using Supervised Machine Learning Algorithms
    Merugu, Suresh
    Reddy, M. Chandra Shekhar
    Goyal, Ekansh
    Piplani, Lakshay
    [J]. ICCCE 2018, 2019, 500 : 141 - 150
  • [39] Performance Analysis of Supervised Machine Learning Algorithms for Text Classification
    Mishu, Sadia Zaman
    Rafiuddin, S. M.
    [J]. PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 409 - 413
  • [40] Amharic Text Complexity Classification Using Supervised Machine Learning
    Nigusie, Gebregziabihier
    Tegegne, Tesfa
    [J]. ARTIFICIAL INTELLIGENCE AND DIGITALIZATION FOR SUSTAINABLE DEVELOPMENT, ICAST 2022, 2023, 455 : 1 - 12