A Robust AUC Maximization Framework With Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification

被引:16
|
作者
Ren, Ke [1 ,2 ]
Yang, Haichuan [1 ,2 ]
Zhao, Yu [1 ,3 ]
Chen, Wu [4 ,5 ]
Xue, Mingshan [4 ,5 ]
Miao, Hongyu [6 ]
Huang, Shuai [7 ]
Liu, Ji [1 ,2 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
[2] Univ Rochester, Dept Elect & Comp Engn, Rochester, NY 14627 USA
[3] Southwestern Univ Finance & Econ, Chengdu 611130, Sichuan, Peoples R China
[4] Baylor Coll Med, Dept Neurosci, Houston, TX 77030 USA
[5] Texas Childrens Hosp, Cain Fdn Labs, Jan & Dan Duncan Neurol Res Inst, Houston, TX 77030 USA
[6] Univ Texas Hlth Sci Ctr Houston, Houston, TX USA
[7] Univ Washington, Dept Ind & Syst Engn, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
Area under the curve (AUC) maximization; feature selection; outlier detection; positive-unlabeled (PU) learning; SUPPORT; SVM; ROC;
D O I
10.1109/TNNLS.2018.2870666
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The positive-unlabeled (PU) classification is a common scenario in real-world applications such as healthcare, text classification, and bioinformatics, in which we only observe a few samples labeled as "positive" together with a large volume of "unlabeled" samples that may contain both positive and negative samples. Building robust classifiers for the PU problem is very challenging, especially for complex data where the negative samples overwhelm and mislabeled samples or corrupted features exist. To address these three issues, we propose a robust learning framework that unifies area under the curve maximization (a robust metric for biased labels), outlier detection (for excluding wrong labels), and feature selection (for excluding corrupted features). The generalization error bounds are provided for the proposed model that give valuable insight into the theoretical performance of the method and lead to useful practical guidance, e.g., to train a model, we find that the included unlabeled samples are sufficient as long as the sample size is comparable to the number of positive samples in the training process. Empirical comparisons and two real-world applications on surgical site infection (SSI) and EEG seizure detection are also conducted to show the effectiveness of the proposed model.
引用
收藏
页码:3072 / 3083
页数:12
相关论文
共 30 条
  • [1] Fast AUC Maximization Learning Machine With Simultaneous Outlier Detection
    Sun, Yichen
    Vong, Chi Man
    Wang, Shitong
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 6843 - 6857
  • [2] Reflective action selection based on positive-unlabeled learning and causality detection model
    Tanaka, Shohei
    Yoshino, Koichiro
    Sudoh, Katsuhito
    Nakamura, Satoshi
    COMPUTER SPEECH AND LANGUAGE, 2023, 78
  • [3] Simultaneous feature selection and outlier detection with optimality guarantees
    Insolia, Luca
    Kenney, Ana
    Chiaromonte, Francesca
    Felici, Giovanni
    BIOMETRICS, 2022, 78 (04) : 1592 - 1603
  • [4] Simultaneous variable selection and outlier detection using a robust genetic algorithm
    Wiegand, Patrick
    Pell, Randy
    Comas, Enric
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2009, 98 (02) : 108 - 114
  • [5] Robust Moderately Clipped LASSO for Simultaneous Outlier Detection and Variable Selection
    Peng, Yang
    Luo, Bin
    Gao, Xiaoli
    SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 2022, 84 (02): : 694 - 707
  • [6] Robust Moderately Clipped LASSO for Simultaneous Outlier Detection and Variable Selection
    Yang Peng
    Bin Luo
    Xiaoli Gao
    Sankhya B, 2022, 84 : 694 - 707
  • [7] A Novel Outlier Detection with Feature Selection Enabled Streaming Data Classification
    Rajakumar, R.
    Devi, S. Sathiya
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 2101 - 2116
  • [8] Outlier detection in classification based on feature-selection-based regression
    Su, Jinxia
    Liu, Qiwen
    Cui, Jingke
    Knowledge and Information Systems, 2025, 67 (02) : 1399 - 1414
  • [9] Multi-Objective Evolutionary Simultaneous Feature Selection and Outlier Detection for Regression
    Jimenez, Fernando
    Lucena-Sanchez, Estrella
    Sanchez, Gracia
    Sciavicco, Guido
    IEEE ACCESS, 2021, 9 : 135675 - 135688
  • [10] An Ensemble Filter Feature Selection Method and Outlier Detection Method for Multiclass Classification
    Ndirangu, Dalton
    Mwangi, Waweru
    Nderu, Lawrence
    2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE AND COMPUTER APPLICATIONS (ICSCA 2019), 2019, : 373 - 379