ImbTreeAUC: An R package for building classification trees using the area under the ROC curve (AUC) on imbalanced datasets

被引:9
|
作者
Gajowniczek, Krzysztof [1 ]
Zabkowski, Tomasz [1 ]
机构
[1] Warsaw Univ Life Sci SGGW, Inst Informat Technol, Dept Artificial Intelligence, PL-02776 Warsaw, Poland
关键词
Decision trees; Area under the ROC curve; Cost-sensitive learning; Imbalanced data;
D O I
10.1016/j.softx.2021.100755
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we propose a novel R package, named ImbTreeAUC, for building binary and multiclass decision tree using the area under the receiver operating characteristic (ROC) curve. The package provides nonstandard measures to select an optimal split point for an attribute as well as the optimal attribute for splitting through the application of local, semiglobal and global AUC measures. Additionally, ImbTreeAUC can handle imbalanced data, which is a challenging issue in many practical applications. The package supports cost-sensitive learning by defining a misclassification cost matrix and weight-sensitive learning. It accepts all types of attributes, including continuous, ordered and nominal attributes. The package and its code are made freely available. (C) 2021 The Author(s). Published by Elsevier B.V.
引用
收藏
页数:7
相关论文
共 43 条
  • [1] ImbTreeEntropy: An R package for building entropy-based classification trees on imbalanced datasets
    Gajowniczek, Krzysztof
    Zabkowski, Tomasz
    [J]. SOFTWAREX, 2021, 16
  • [2] Empirical Comparison of Area under ROC curve (AUC) and Mathew Correlation Coefficient (MCC) for Evaluating Machine Learning Algorithms on Imbalanced Datasets for Binary Classification
    Halimu, Chongomweru
    Kasem, Asem
    Newaz, S. H. Shah
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING (ICMLSC 2019), 2019, : 1 - 6
  • [3] Implications of imbalanced datasets for empirical ROC-AUC estimation in binary classification tasks
    Liu, Yujian
    Li, Yazhe
    Xie, Dejun
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024, 94 (01) : 183 - 203
  • [4] Some results on the area under the curve (AUC) for ROC curves.
    Walter, SD
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2002, 155 (11) : s48 - s48
  • [5] Prequential AUC: properties of the area under the ROC curve for data streams with concept drift
    Dariusz Brzezinski
    Jerzy Stefanowski
    [J]. Knowledge and Information Systems, 2017, 52 : 531 - 562
  • [6] Combining biomarkers linearly and nonlinearly for classification using the area under the ROC curve
    Fong, Youyi
    Yin, Shuxin
    Huang, Ying
    [J]. STATISTICS IN MEDICINE, 2016, 35 (21) : 3792 - 3809
  • [7] Prequential AUC: properties of the area under the ROC curve for data streams with concept drift
    Brzezinski, Dariusz
    Stefanowski, Jerzy
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 52 (02) : 531 - 562
  • [8] Grouped Variable Selection Using Area under the ROC with Imbalanced Data
    Li, Yang
    Qin, Yichen
    Wang, Limin
    Chen, Jiaxu
    Ma, Shuangge
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2016, 45 (04) : 1268 - 1280
  • [9] A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies
    Martina Kottas
    Oliver Kuss
    Antonia Zapf
    [J]. BMC Medical Research Methodology, 14
  • [10] A modified Wald interval for the area under the ROC curve (AUC) in diagnostic case-control studies
    Kottas, Martina
    Kuss, Oliver
    Zapf, Antonia
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2014, 14