Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

被引:0
|
作者
Xu-Ying Liu
Sheng-Tao Wang
Min-Ling Zhang
机构
[1] Southeast University,School of Computer Science and Engineering
[2] Ministry of Education,Key Laboratory of Computer Network and Information Integration (Southeast University)
[3] Collaborative Innovation Center for Wireless Communications Technology,undefined
来源
关键词
machine learning; data mining; class imbalance; over sampling; boosting; transfer learning;
D O I
暂无
中图分类号
学科分类号
摘要
The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.
引用
收藏
页码:996 / 1009
页数:13
相关论文
共 50 条
  • [21] Imbalanced data classification using improved synthetic minority over-sampling technique
    Anusha, Yamijala
    Visalakshi, R.
    Srinivas, Konda
    [J]. MULTIAGENT AND GRID SYSTEMS, 2023, 19 (02) : 117 - 131
  • [22] A Method for Class-Imbalance Learning in Android Malware Detection
    Guan, Jun
    Jiang, Xu
    Mao, Baolei
    [J]. ELECTRONICS, 2021, 10 (24)
  • [23] Distributed Sparse Class-Imbalance Learning and Its Applications
    Maurya, Chandresh Kumar
    Toshniwal, Durga
    Venkoparao, Gopalan Vijendran
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (05) : 832 - 844
  • [24] METAbolomics data Balancing with Over-sampling Algorithms (META-BOA): an online resource for addressing class imbalance
    Hashimoto-Roth, Emily
    Surendra, Anuradha
    Lavallee-Adam, Mathieu
    Bennett, Steffany A. L.
    Cuperlovic-Culf, Miroslava
    [J]. BIOINFORMATICS, 2022, 38 (23) : 5326 - 5327
  • [25] Safe Level Graph for Synthetic Minority Over-sampling Techniques
    Bunkhumpornpat, Chumphol
    Subpaiboonkit, Sitthichoke
    [J]. 2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 570 - 575
  • [26] Towards Mitigating the Class-Imbalance Problem for Partial Label Learning
    Wang, Jing
    Zhang, Min-Ling
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2427 - 2436
  • [27] LVQ-SMOTE - Learning Vector Quantization based Synthetic Minority Over-sampling Technique for biomedical data
    Nakamura, Munehiro
    Kajiwara, Yusuke
    Otsuka, Atsushi
    Kimura, Haruhiko
    [J]. BIODATA MINING, 2013, 6
  • [28] Graph-Based Class-Imbalance Learning With Label Enhancement
    Du, Guodong
    Zhang, Jia
    Jiang, Min
    Long, Jinyi
    Lin, Yaojin
    Li, Shaozi
    Tan, Kay Chen
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6081 - 6095
  • [29] A systematic review for class-imbalance in semi-supervised learning
    Willian Dihanster Gomes de Oliveira
    Lilian Berton
    [J]. Artificial Intelligence Review, 2023, 56 : 2349 - 2382
  • [30] Large-Scale Distributed Sparse Class-Imbalance Learning
    Maurya, Chandresh Kumar
    Toshniwal, Durga
    [J]. INFORMATION SCIENCES, 2018, 456 : 1 - 12