Relationships between Diversity of Classification Ensembles and Single-Class Performance Measures

被引:87
|
作者
Wang, Shuo [1 ]
Yao, Xin [1 ]
机构
[1] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, W Midlands, England
基金
英国工程与自然科学研究理事会;
关键词
Class imbalance learning; ensemble learning; diversity; single-class performance measures; data mining; STATISTICS; ACCURACY;
D O I
10.1109/TKDE.2011.207
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In class imbalance learning problems, how to better recognize examples from the minority class is the key focus, since it is usually more important and expensive than the majority class. Quite a few ensemble solutions have been proposed in the literature with varying degrees of success. It is generally believed that diversity in an ensemble could help to improve the performance of class imbalance learning. However, no study has actually investigated diversity in depth in terms of its definitions and effects in the context of class imbalance learning. It is unclear whether diversity will have a similar or different impact on the performance of minority and majority classes. In this paper, we aim to gain a deeper understanding of if and when ensemble diversity has a positive impact on the classification of imbalanced data sets. First, we explain when and why diversity measured by Q-statistic can bring improved overall accuracy based on two classification patterns proposed by Kuncheva et al. We define and give insights into good and bad patterns in imbalanced scenarios. Then, the pattern analysis is extended to single-class performance measures, including recall, precision, and F-measure, which are widely used in class imbalance learning. Six different situations of diversity's impact on these measures are obtained through theoretical analysis. Finally, to further understand how diversity affects the single class performance and overall performance in class imbalance problems, we carry out extensive experimental studies on both artificial data sets and real-world benchmarks with highly skewed class distributions. We find strong correlations between diversity and discussed performance measures. Diversity shows a positive impact on the minority class in general. It is also beneficial to the overall performance in terms of AUC and G-mean.
引用
收藏
页码:206 / 219
页数:14
相关论文
共 50 条
  • [1] Theoretical Study of the Relationship Between Diversity and Single-Class Measures for Class Imbalance Learning
    Wang, Shuo
    Yao, Xin
    2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 76 - 81
  • [2] Single-class classification with mapping convergence
    Yu, HJ
    MACHINE LEARNING, 2005, 61 (1-3) : 49 - 69
  • [3] Single-Class Classification with Mapping Convergence
    Hwanjo Yu
    Machine Learning, 2005, 61 : 49 - 69
  • [4] Exploring the Relationships between Data Complexity and Classification Diversity in Ensembles
    Garcia, Nathan Formentin
    Tiggeman, Frederico
    Borges, Eduardo N.
    Lucca, Giancarlo
    Santos, Helida
    Dimuro, Gracaliz
    PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS (ICEIS 2021), VOL 1, 2021, : 652 - 659
  • [5] Fast Single-Class Classification and the Principle of Logit Separation
    Keren, Gil
    Sabato, Sivan
    Schuller, Bjoern
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 227 - 236
  • [6] Analysis of loss functions for fast single-class classification
    Keren, Gil
    Sabato, Sivan
    Schuller, Bjoern
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (01) : 337 - 358
  • [7] Analysis of loss functions for fast single-class classification
    Gil Keren
    Sivan Sabato
    Björn Schuller
    Knowledge and Information Systems, 2020, 62 : 337 - 358
  • [8] Diversity in Ensembles for One-Class Classification
    Krawczyk, Bartosz
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, 2013, 185 : 119 - 129
  • [9] Single-class classification augmented with unlabeled data: A symbolic approach
    Skabar, A
    AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2903 : 735 - 746
  • [10] Diversity measures for one-class classifier ensembles
    Krawczyk, Bartosz
    Wozniak, Michal
    NEUROCOMPUTING, 2014, 126 : 36 - 44