A New Performance Measure for Class Imbalance Learning. Application to Bioinformatics Problems

被引:26
|
作者
Batuwita, Rukshan [1 ]
Palade, Vasile [1 ]
机构
[1] Univ Oxford, Comp Lab, Oxford OX1 3QD, England
关键词
Performance Measures; Class Imbalance Learning; Bioinformatics; Model Selection; SVMs; CLASSIFICATION;
D O I
10.1109/ICMLA.2009.126
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In class imbalance learning, the performance measure used for the model selection would play a vital role. It has been well-studied in the past research that the most widely used performance measure, the overall accuracy of the model, can lead to sub-optimal classification models when learning from imbalanced datasets. In order to overcome this problem, other performance measures, such as the Geometric-mean (Gm) and F-measure (Fm), have been used for imbalanced dataset learning. Training a classifier system with an imbalanced dataset (where the positive class is the minority class) would usually produce sub-optimal models having a higher Specificity (SP) and a lower Sensitivity (SE). By applying class imbalance learning methods, we would often be able to increase the SE by sacrificing some amount of SP. In some type of real world imbalanced classification problems, such as the gene finding Bioinformatics problems, it is important to improve the SE as much as possible by keeping the reduction of SP to the minimum. In this paper, we show that with respect to this type of classification problems the existing performance measures used in class imbalance learning (Gm and Fm) can still result in sub-optimal classification models. In order to circumvent these problems, we introduced a new performance measure, called Adjusted Geometric-mean (AGm). We show, both analytically and empirically on two real-world Bioinformatics datasets, that AGm can perform better than Gm and Fm metrics.
引用
收藏
页码:545 / 550
页数:6
相关论文
共 50 条
  • [1] A Review of Class Imbalance Learning Methods in Bioinformatics
    Yu, Hualong
    Sun, Changyin
    Yang, Wankou
    Xu, Sen
    Dan, Yuanyuan
    [J]. CURRENT BIOINFORMATICS, 2015, 10 (04) : 360 - 369
  • [2] Strategies for learning in class imbalance problems
    Barandela, R
    Sánchez, JS
    García, V
    Rangel, E
    [J]. PATTERN RECOGNITION, 2003, 36 (03) : 849 - 851
  • [3] Unsupervised Ensemble Learning for Class Imbalance Problems
    Liu, Zihan
    Wu, Dongrui
    [J]. 2018 CHINESE AUTOMATION CONGRESS (CAC), 2018, : 3593 - 3600
  • [4] On the Performance of Oversampling Techniques for Class Imbalance Problems
    Kong, Jiawen
    Rios, Thiago
    Kowalczyk, Wojtek
    Menzel, Stefan
    Back, Thomas
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT II, 2020, 12085 : 84 - 96
  • [5] Probabilistic Metric to measure the imbalance in multi-class problems
    Lopes Agostinho, Solander Patricio
    Mendes-Moreira, Joao
    [J]. FOURTH INTERNATIONAL WORKSHOP ON LEARNING WITH IMBALANCED DOMAINS: THEORY AND APPLICATIONS, VOL 183, 2022, 183 : 151 - 162
  • [6] Extreme Learning Machine prediction under high class imbalance in bioinformatics
    Rodriguez, T.
    Di Persia, L. E.
    Milone, D. H.
    Stegmayer, G.
    [J]. 2017 XLIII LATIN AMERICAN COMPUTER CONFERENCE (CLEI), 2017,
  • [7] Transfer learning for class imbalance problems with inadequate data
    Al-Stouhi, Samir
    Reddy, Chandan K.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 48 (01) : 201 - 228
  • [8] Transfer learning for class imbalance problems with inadequate data
    Samir Al-Stouhi
    Chandan K. Reddy
    [J]. Knowledge and Information Systems, 2016, 48 : 201 - 228
  • [9] Action learning in action: Transforming problems and people for world-class organizational learning.
    Rossett, A
    [J]. PERSONNEL PSYCHOLOGY, 1999, 52 (04) : 1100 - 1101
  • [10] CCL: CLASS-WISE CURRICULUM LEARNING FOR CLASS IMBALANCE PROBLEMS.
    Escudero-Vinolo, Marcos
    Lopez-Cifuentes, Alejandro
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1476 - 1480