Feature Selection for Maximizing the Area Under the ROC Curve

被引:42
|
作者
Wang, Rui [1 ]
Tang, Ke [1 ]
机构
[1] Univ Sci & Technol China, NICAL, Hefei 230027, Peoples R China
关键词
CLASSIFICATION; PREDICTION; CANCER; TUMOR;
D O I
10.1109/ICDMW.2009.25
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is an important pre-processing step for solving classification problems. A good feature selection method may not only improve the performance of the final classifier but also reduce the computational complexity of it. Traditionally, feature selection methods were developed to maximize the classification accuracy of a classifier Recently, both theoretical and experimental studies revealed that a classifier with the highest accuracy might not be ideal in real-world problems. Instead, the Area Under the ROC Curve (AUC) has been suggested as the alternative metric, and many existing learning algorithms have been modified in order to seek the classifier with maximum AUC. However, little work was done to develop new feature selection methods to suit the requirement of AUC maximization. To fill this gap in the literature, we propose in this paper a novel algorithm, called AUC and Rank Correlation coefficient Optimization (ARCO) algorithm. ARCO adopts the general framework of a well-known method, namely minimal-redundancy-maximal-relevance (mRMR) criterion, but defines the terms "relevance" and "redundancy" in totally different ways. Such a modification looks trivial from the perspective of algorithmic design. Nevertheless, experimental study on four gene expression data sets showed that feature subsets obtained by ARCO resulted in classifiers with significantly larger AUC than the feature subsets obtained by mRMR. Moreover ARCO also outperformed the Feature Assessment by Sliding Thresholds algorithm, which was recently proposed for AUC maximization, and thus the efficacy of ARCO was validated.
引用
收藏
页码:400 / 405
页数:6
相关论文
共 50 条
  • [1] Maximizing the area under the ROC curve by pairwise feature combination
    Marrocco, C.
    Duin, R. P. W.
    Tortorella, F.
    [J]. PATTERN RECOGNITION, 2008, 41 (06) : 1961 - 1974
  • [2] Ranking Instances by Maximizing the Area under ROC Curve
    Guvenir, H. Altay
    Kurtcephe, Murat
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (10) : 2356 - 2366
  • [3] Score Fusion by Maximizing the Area under the ROC Curve
    Villegas, Mauricio
    Paredes, Roberto
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PROCEEDINGS, 2009, 5524 : 473 - 480
  • [4] Combination of Dichotomizers for Maximizing the Partial Area under the ROC Curve
    Ricamato, Maria Teresa
    Tortorella, Francesco
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 660 - 669
  • [5] Maximizing area under ROC curve for biometric scores fusion
    Toh, Kar-Ann
    Kim, Jaihie
    Lee, Sangyoun
    [J]. PATTERN RECOGNITION, 2008, 41 (11) : 3373 - 3392
  • [6] A boosting method for maximizing the partial area under the ROC curve
    Osamu Komori
    Shinto Eguchi
    [J]. BMC Bioinformatics, 11
  • [7] On Linear Combinations of Dichotomizers for Maximizing the Area Under the ROC Curve
    Marrocco, Claudio
    Molinara, Mario
    Tortorella, Francesco
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (03): : 610 - 620
  • [8] A boosting method for maximizing the partial area under the ROC curve
    Komori, Osamu
    Eguchi, Shinto
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [9] Marker selection via maximizing the partial area under the ROC curve of linear risk scores
    Wang, Zhanfeng
    Chang, Yuan-Chin Ivan
    [J]. BIOSTATISTICS, 2011, 12 (02) : 369 - 385
  • [10] Maximizing the Area under the ROC Curve with Decision Lists and Rule Sets
    Bostrom, Henrik
    [J]. PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 27 - 34