Grouped Variable Selection Using Area under the ROC with Imbalanced Data

被引:2
|
作者
Li, Yang [1 ,2 ]
Qin, Yichen [3 ]
Wang, Limin [4 ]
Chen, Jiaxu [4 ]
Ma, Shuangge [2 ,5 ]
机构
[1] Renmin Univ China, Ctr Appl Stat, Beijing 100872, Peoples R China
[2] Renmin Univ China, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[3] Univ Cincinnati, Dept Operat Business Analyt & Informat Syst, Cincinnati, OH 45221 USA
[4] Beijing Univ Chinese Med, Sch Preclin Med, Beijing, Peoples R China
[5] Yale Univ, Dept Biostat, New Haven, CT USA
关键词
Area under ROC; Group lasso; Imbalanced data; True positive rate; REGRESSION SHRINKAGE; CLASSIFICATION; LASSO;
D O I
10.1080/03610918.2013.818691
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Imbalanced data brings biased classification and causes the low accuracy of the classification of the minority class. In this article, we propose a methodology to select grouped variables using the area under the ROC with an adjustable prediction cut point. The proposed method enhance the accuracy of classification for the minority class by maximizing the true positive rate. Simulation results show that the proposed method is appropriate for both the categorical and continuous covariates. An illustrative example of the analysis of the SHS data in TCM is discussed to show the reasonable application of the proposed method.
引用
收藏
页码:1268 / 1280
页数:13
相关论文
共 50 条
  • [1] Nonparametric additive model with grouped lasso and maximizing area under the ROC curve
    Choi, Sungwoo
    Park, Junyong
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 77 : 313 - 325
  • [2] A Novel Software Metric Selection Technique Using the Area Under ROC Curves
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    [J]. 22ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING & KNOWLEDGE ENGINEERING (SEKE 2010), 2010, : 203 - 208
  • [3] Biomarker selection for medical diagnosis using the partial area under the ROC curve
    Hsu M.-J.
    Chang Y.-C.I.
    Hsueh H.-M.
    [J]. BMC Research Notes, 7 (1)
  • [4] Feature Selection for Maximizing the Area Under the ROC Curve
    Wang, Rui
    Tang, Ke
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 400 - 405
  • [5] ImbTreeAUC: An R package for building classification trees using the area under the ROC curve (AUC) on imbalanced datasets
    Gajowniczek, Krzysztof
    Zabkowski, Tomasz
    [J]. SOFTWAREX, 2021, 15
  • [6] Variable Selection in ROC Regression
    Wang, Binhuan
    [J]. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2013, 2013
  • [7] Estimation of the area under ROC curve with censored data
    Wang, Qihua
    Yao, Lili
    Lai, Peng
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (03) : 1033 - 1044
  • [8] Robust Grouped Variable Selection Using Distributionally Robust Optimization
    Ruidi Chen
    Ioannis Ch. Paschalidis
    [J]. Journal of Optimization Theory and Applications, 2022, 194 : 1042 - 1071
  • [9] Robust Grouped Variable Selection Using Distributionally Robust Optimization
    Chen, Ruidi
    Paschalidis, Ioannis Ch
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2022, 194 (03) : 1042 - 1071
  • [10] Limitation of ROC in Evaluation of Classifiers for Imbalanced Data
    Movahedi, F.
    Antaki, J. F.
    [J]. JOURNAL OF HEART AND LUNG TRANSPLANTATION, 2021, 40 (04): : S413 - S413