Text Generation for Imbalanced Text Classification

被引：0

作者：

Akkaradamrongrat, Suphamongkol ^{[1
]}

Kachamas, Pornpimon ^{[2
]}

Sinthupinyo, Sukree ^{[1
]}

机构：

[1] Chulalongkorn Univ, Dept Comp Engn, Bangkok, Thailand

[2] Chulalongkorn Univ, Grad Sch, Technopreneurship & Innovat Management, Bangkok, Thailand

来源：

2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019) | 2019年

关键词：

imbalanced text classification; text generation; Markov chains; LSTM;

D O I：

10.1109/jcsse.2019.8864181

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The problem of imbalanced data can be frequently found in the real-world data. It leads to the bias of classification models, that is, the models predict most samples as major classes which are often the negative class. In this research, text generation techniques were used to generate synthetic minority class samples to make the text dataset balanced. Two text generation methods: the text generation using Markov Chains and the text generation using Long Short-term Memory (LSTM) networks were applied and compared in the term of ability to improve the performance of imbalanced text classification. Our experimental study is based on LSTM networks classifier. Traditional over-sampling technique was also used as baseline. The study investigated our Thai-language advertisement text dataset from Facebook. According to the increase of recall value, applying of these techniques showed the improvement of an ability to create model predicting more positive samples, which are minority samples. It can be found that the Markov Chains technique outperformed traditional over-sampling and text generation using LSTM in majority of the models.

引用

页码：181 / 186

页数：6

共 50 条

[1] The Text Classification for Imbalanced Data Sets
Li, Yanling
Zhu, Yehang
Yang, Ping
[J]. ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 778 - +
[2] Utilizing DTRS for Imbalanced Text Classification
Zhou, Bing
Yao, Yiyu
Liu, Qingzhong
[J]. ROUGH SETS, (IJCRS 2016), 2016, 9920 : 219 - 228
[3] Distributional Random Oversampling for Imbalanced Text Classification
Moreo, Alejandro
Esuli, Andrea
Sebastiani, Fabrizio
[J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 805 - 808
[4] Adaptable Focal Loss for Imbalanced Text Classification
Cao, Lu
Liu, Xinyue
Shen, Hong
[J]. PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PDCAT 2021, 2022, 13148 : 466 - 475
[5] Imbalanced text classification: A term weighting approach
Liu, Ying
Loh, Han Tong
Sun, Aixin
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 690 - 701
[6] Optimal Feature Selection for Imbalanced Text Classification
Khurana, Anshu
Verma, Om Prakash
[J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
[7] Classwise Clustering for Classification of Imbalanced Text Data
Swarnalatha, K.
Guru, D. S.
Anami, Basavaraj S.
Suhil, Mahamad
[J]. EMERGING RESEARCH IN ELECTRONICS, COMPUTER SCIENCE AND TECHNOLOGY, ICERECT 2018, 2019, 545 : 83 - 94
[8] Improving Imbalanced Text Classification with Dynamic Curriculum Learning
Zhang, Xulong
Wang, Jianzong
Cheng, Ning
Xiao, Jing
[J]. 2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 1031 - 1036
[9] A Novel Term Weighting Scheme for Imbalanced Text Classification
Tantisripreecha, Tanapon
Soonthornphisaj, Nuanwan
[J]. Informatica (Slovenia), 2022, 46 (02): : 259 - 268
[10] Effects of Preprocessing on Text Classification in Balanced and Imbalanced Datasets
Karaca, Mehmet F.
[J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2024, 18 (03): : 591 - 609

← 1 2 3 4 5 →