Do unbalanced data have a negative effect on LDA?

被引:51
|
作者
Xue, Jing-Hao [1 ]
Titterington, D. Michael [1 ]
机构
[1] Univ Glasgow, Dept Stat, Glasgow G12 8QQ, Lanark, Scotland
关键词
area under an ROC curve (AUC); linear discriminant analysis (LDA); misclassification error rate (ER); unbalanced data;
D O I
10.1016/j.patcog.2007.11.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For two-class discrimination, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557-562] claimed that, when covariance matrices of the two classes were unequal, a (class) unbalanced data set had a negative effect on the performance of linear discriminant analysis (LDA). Through re-balancing 10 real-world data sets, Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557-562] provided empirical evidence to support the claim using AUC (Area Under the receiver operating characteristic Curve) as the performance metric. We suggest that such a claim is vague if not misleading, there is no solid theoretical analysis presented in Xie and Qiu [The effect of imbalanced data sets on LDA: a theoretical and empirical analysis, Pattern Recognition 40 (2) (2007) 557-562], and AUC can lead to a quite different conclusion from that led to by misclassification error rate (ER) on the discrimination performance of LDA for unbalanced data sets. Our empirical and simulation studies suggest that, for LDA, the increase of the median of AUC (and thus the improvement of performance of LDA) from re-balancing is relatively small, while, in contrast, the increase of the median of ER (and thus the decline in performance of LDA) from re-balancing is relatively large. Therefore, from our study, there is no reliable empirical evidence to support the claim that a (class) unbalanced data set has a negative effect on the performance of LDA. In addition, re-balancing affects the performance of LDA for data sets with either equal or unequal covariance matrices, indicating that having unequal covariance matrices is not a key reason for the difference in performance between original and re-balanced data. (c) 2007 Elsevier Ltd. All rights reserved.
引用
下载
收藏
页码:1558 / 1571
页数:14
相关论文
共 50 条
  • [21] Financial incentives for vaccination do not have negative unintended consequences
    Florian H. Schneider
    Pol Campos-Mercade
    Stephan Meier
    Devin Pope
    Erik Wengström
    Armando N. Meier
    Nature, 2023, 613 : 526 - 533
  • [22] Do positive and negative emotions have opposing influences on hope?
    Lerner, JS
    Small, DA
    PSYCHOLOGICAL INQUIRY, 2002, 13 (04) : 299 - 302
  • [23] Financial incentives for vaccination do not have negative unintended consequences
    Schneider, Florian H.
    Campos-Mercade, Pol
    Meier, Stephan
    Pope, Devin
    Wengstrom, Erik
    Meier, Armando N.
    NATURE, 2023, 613 (7944) : 526 - +
  • [24] Do Negative Judgments of Taste Have a priori Grounds in Kant?
    Wenzel, Christian Helmut
    KANT-STUDIEN, 2012, 103 (04) : 472 - 493
  • [25] Do smokers have a negative implicit attitude toward smoking?
    De Houwer, Jan
    Custers, Roel
    De Clercq, Armand
    COGNITION & EMOTION, 2006, 20 (08) : 1274 - 1284
  • [26] Parental sensitivity to infant distress: what do discrete negative emotions have to do with it?
    Mesman, Judi
    Oster, Harriet
    Camras, Linda
    ATTACHMENT & HUMAN DEVELOPMENT, 2012, 14 (04) : 337 - 348
  • [27] The effect of imbalanced data sets on LDA: A theoretical and empirical analysis
    Xie, Jigang
    Qiu, Zhengding
    PATTERN RECOGNITION, 2007, 40 (02) : 557 - 562
  • [28] Evaluating the effect of unbalanced data in biomedical document classification
    Laza, Rosalia
    Pavon, Reyes
    Reboiro-Jato, Miguel
    Fdez-Riverola, Florentino
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2011, 8 (03)
  • [29] UNBALANCED DATA
    HOCKING, RR
    AMERICAN STATISTICIAN, 1976, 30 (04): : 206 - 206
  • [30] FAMOTIDINE DOES NOT HAVE A NEGATIVE INOTROPIC EFFECT
    BERLIN, RG
    LANCET, 1987, 2 (8573): : 1468 - 1468