A Robust AUC Maximization Framework With Simultaneous Outlier Detection and Feature Selection for Positive-Unlabeled Classification

被引:16
|
作者
Ren, Ke [1 ,2 ]
Yang, Haichuan [1 ,2 ]
Zhao, Yu [1 ,3 ]
Chen, Wu [4 ,5 ]
Xue, Mingshan [4 ,5 ]
Miao, Hongyu [6 ]
Huang, Shuai [7 ]
Liu, Ji [1 ,2 ]
机构
[1] Univ Rochester, Dept Comp Sci, Rochester, NY 14627 USA
[2] Univ Rochester, Dept Elect & Comp Engn, Rochester, NY 14627 USA
[3] Southwestern Univ Finance & Econ, Chengdu 611130, Sichuan, Peoples R China
[4] Baylor Coll Med, Dept Neurosci, Houston, TX 77030 USA
[5] Texas Childrens Hosp, Cain Fdn Labs, Jan & Dan Duncan Neurol Res Inst, Houston, TX 77030 USA
[6] Univ Texas Hlth Sci Ctr Houston, Houston, TX USA
[7] Univ Washington, Dept Ind & Syst Engn, Seattle, WA 98195 USA
基金
美国国家科学基金会;
关键词
Area under the curve (AUC) maximization; feature selection; outlier detection; positive-unlabeled (PU) learning; SUPPORT; SVM; ROC;
D O I
10.1109/TNNLS.2018.2870666
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The positive-unlabeled (PU) classification is a common scenario in real-world applications such as healthcare, text classification, and bioinformatics, in which we only observe a few samples labeled as "positive" together with a large volume of "unlabeled" samples that may contain both positive and negative samples. Building robust classifiers for the PU problem is very challenging, especially for complex data where the negative samples overwhelm and mislabeled samples or corrupted features exist. To address these three issues, we propose a robust learning framework that unifies area under the curve maximization (a robust metric for biased labels), outlier detection (for excluding wrong labels), and feature selection (for excluding corrupted features). The generalization error bounds are provided for the proposed model that give valuable insight into the theoretical performance of the method and lead to useful practical guidance, e.g., to train a model, we find that the included unlabeled samples are sufficient as long as the sample size is comparable to the number of positive samples in the training process. Empirical comparisons and two real-world applications on surgical site infection (SSI) and EEG seizure detection are also conducted to show the effectiveness of the proposed model.
引用
收藏
页码:3072 / 3083
页数:12
相关论文
共 30 条
  • [21] Simultaneous feature selection and classification based on genetic algorithms: an application to colonic polyp detection
    Zheng, Yalin
    Yang, Xiaoyun
    Siddique, Musib
    Beddoe, Gareth
    MEDICAL IMAGING 2008: COMPUTER-AIDED DIAGNOSIS, PTS 1 AND 2, 2008, 6915
  • [22] A robust SVM-based approach with feature selection and outliers detection for classification problems
    Baldomero-Naranjo, Marta
    Martinez-Merino, Luisa I.
    Rodriguez-Chia, Antonio M.
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178
  • [23] RR-PU: A Synergistic Two-Stage Positive and Unlabeled Learning Framework for Robust Tax Evasion Detection
    Cao, Shuzhi
    Ruan, Jianfei
    Dong, Bo
    Shi, Bin
    Zheng, Qinghua
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 8, 2024, : 8246 - 8254
  • [24] Locality Regularized Robust-PCRC: A Novel Simultaneous Feature Extraction and Classification Framework for Hyperspectral Images
    Yang, Zhijing
    Cao, Faxian
    Cheng, Yongqiang
    Ling, Wing-Kuen
    Hu, Ruo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (12): : 8567 - 8582
  • [25] A genetic algorithm feature selection approach to robust classification between "positive" and "negative" emotional states in speakers
    Beritelli, Francesco
    Casale, Salvatore
    Russo, Alessandra
    Serrano, Salvatore
    2005 39TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1 AND 2, 2005, : 550 - 553
  • [26] Robust simultaneous positive data clustering and unsupervised feature selection using generalized inverted Dirichlet mixture models
    Al Mashrgy, Mohamed
    Bdiri, Taoufik
    Bouguila, Nizar
    KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 182 - 195
  • [27] Review of Decision Tree-Based Binary Classification Framework Using Robust 3D Image and Feature Selection for Malaria-Infected Erythrocyte Detection
    Ali, Syed Azar
    Kumar, S. Phani
    DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 759 - 780
  • [28] Face detection and facial expression recognition using simultaneous clustering and feature selection via an expectation propagation statistical learning framework
    Fan, Wentao
    Bouguila, Nizar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (12) : 4303 - 4327
  • [29] Face detection and facial expression recognition using simultaneous clustering and feature selection via an expectation propagation statistical learning framework
    Wentao Fan
    Nizar Bouguila
    Multimedia Tools and Applications, 2015, 74 : 4303 - 4327
  • [30] GastroNet: A robust attention-based deep learning and cosine similarity feature selection framework for gastrointestinal disease classification from endoscopic images
    Noor, Muhammad Nouman
    Nazir, Muhammad
    Ashraf, Imran
    Almujally, Nouf Abdullah
    Aslam, Muhammad
    Fizzah Jilani, Syeda
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023,