Text Generation for Imbalanced Text Classification

被引:0
|
作者
Akkaradamrongrat, Suphamongkol [1 ]
Kachamas, Pornpimon [2 ]
Sinthupinyo, Sukree [1 ]
机构
[1] Chulalongkorn Univ, Dept Comp Engn, Bangkok, Thailand
[2] Chulalongkorn Univ, Grad Sch, Technopreneurship & Innovat Management, Bangkok, Thailand
关键词
imbalanced text classification; text generation; Markov chains; LSTM;
D O I
10.1109/jcsse.2019.8864181
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The problem of imbalanced data can be frequently found in the real-world data. It leads to the bias of classification models, that is, the models predict most samples as major classes which are often the negative class. In this research, text generation techniques were used to generate synthetic minority class samples to make the text dataset balanced. Two text generation methods: the text generation using Markov Chains and the text generation using Long Short-term Memory (LSTM) networks were applied and compared in the term of ability to improve the performance of imbalanced text classification. Our experimental study is based on LSTM networks classifier. Traditional over-sampling technique was also used as baseline. The study investigated our Thai-language advertisement text dataset from Facebook. According to the increase of recall value, applying of these techniques showed the improvement of an ability to create model predicting more positive samples, which are minority samples. It can be found that the Markov Chains technique outperformed traditional over-sampling and text generation using LSTM in majority of the models.
引用
收藏
页码:181 / 186
页数:6
相关论文
共 50 条
  • [1] The Text Classification for Imbalanced Data Sets
    Li, Yanling
    Zhu, Yehang
    Yang, Ping
    [J]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 778 - +
  • [2] Utilizing DTRS for Imbalanced Text Classification
    Zhou, Bing
    Yao, Yiyu
    Liu, Qingzhong
    [J]. ROUGH SETS, (IJCRS 2016), 2016, 9920 : 219 - 228
  • [3] Distributional Random Oversampling for Imbalanced Text Classification
    Moreo, Alejandro
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 805 - 808
  • [4] Adaptable Focal Loss for Imbalanced Text Classification
    Cao, Lu
    Liu, Xinyue
    Shen, Hong
    [J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 466 - 475
  • [5] Imbalanced text classification: A term weighting approach
    Liu, Ying
    Loh, Han Tong
    Sun, Aixin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 690 - 701
  • [6] Optimal Feature Selection for Imbalanced Text Classification
    Khurana, Anshu
    Verma, Om Prakash
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [7] Classwise Clustering for Classification of Imbalanced Text Data
    Swarnalatha, K.
    Guru, D. S.
    Anami, Basavaraj S.
    Suhil, Mahamad
    [J]. EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 83 - 94
  • [8] Improving Imbalanced Text Classification with Dynamic Curriculum Learning
    Zhang, Xulong
    Wang, Jianzong
    Cheng, Ning
    Xiao, Jing
    [J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 1031 - 1036
  • [9] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. Informatica (Slovenia), 2022, 46 (02): : 259 - 268
  • [10] Effects of Preprocessing on Text Classification in Balanced and Imbalanced Datasets
    Karaca, Mehmet F.
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (03): : 591 - 609