Conformalized Semi-supervised Random Forest for Classification and Abnormality Detection

被引:0
|
作者
Han, Yujin [1 ,4 ]
Xu, Mingwenchan [2 ,4 ]
Guan, Leying [3 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Northwestern Univ, Dept IEMS, Evanston, IL USA
[3] Yale Univ, Dept Biostat, New Haven, CT 06520 USA
[4] Yale Univ, New Haven, CT USA
关键词
PREDICTIVE INFERENCE; COVARIATE SHIFT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Random Forests classifier, a widely utilized o.-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during training, can pose significant challenges. To address this problem, we introduce the Conformalized Semi-Supervised Random Forest (CSForest), which couples the conformalization technique Jackknife+aB with semi-supervised tree ensembles to construct a set-valued prediction C(x). Instead of optimizing over the training distribution, CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set. Theoretically, we establish CSForest to cover true labels for previously observed inlier classes under arbitrarily label-shift in the test data. We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets, under different types of distribution changes in the test domain. Our results highlight CSForest's effective prediction of inliers and its ability to detect outlier samples unique to the test data. In addition, CSForest shows persistently good performance as the sizes of the training and test sets vary. Codes of CSForest are available at https://github.com/yujinhan98/CSForest
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Probabilistic semi-supervised random subspace sparse representation for classification
    Zhao, Zhuang
    Bai, Lianfa
    Zhang, Yi
    Han, Jing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 23245 - 23271
  • [22] SEMI-SUPERVISED DISCRIMINATIVE RANDOM FIELD FOR HYPERSPECTRAL IMAGE CLASSIFICATION
    Li, Jun
    Bioucas-Dias, Jose M.
    Plaza, Antonio
    2012 4TH WORKSHOP ON HYPERSPECTRAL IMAGE AND SIGNAL PROCESSING (WHISPERS), 2012,
  • [23] Semi-supervised Classification and Segmentation of Forest Fire Using Autoencoders
    Koottungal, Akash
    Pandey, Shailesh
    Nambiar, Athira
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, ACIVS 2023, 2023, 14124 : 27 - 39
  • [24] Probabilistic semi-supervised random subspace sparse representation for classification
    Zhuang Zhao
    Lianfa Bai
    Yi Zhang
    Jing Han
    Multimedia Tools and Applications, 2018, 77 : 23245 - 23271
  • [25] Semi-Supervised Random Forests
    Leistner, Christian
    Saffari, Amir
    Santner, Jakob
    Bischof, Horst
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 506 - 513
  • [26] SEMI-SUPERVISED IMAGE CLASSIFICATION IN LARGE DATASETS BY USING RANDOM FOREST AND FUZZY QUANTIFICATION OF THE SALIENT OBJECT
    Merdassi, Hager
    Barhoumi, Walid
    Zagrouba, Ezzeddine
    2014 INTERNATIONAL WORKSHOP ON COMPUTATIONAL INTELLIGENCE FOR MULTIMEDIA UNDERSTANDING (IWCIM), 2014,
  • [27] ENSEMBLE MARGIN BASED SEMI-SUPERVISED RANDOM FOREST FOR THE CLASSIFICATION OF HYPERSPECTRAL IMAGE WITH LIMITED TRAINING DATA
    Feng, Wei
    Huang, Wenjiang
    Dauphin, Gabriel
    Xia, Junshi
    Quan, Yinghui
    Ye, Huichun
    Dong, Yingying
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 971 - 974
  • [28] Semi-supervised classification trees
    Levatic, Jurica
    Ceci, Michelangelo
    Kocev, Dragi
    Dzeroski, Saso
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (03) : 461 - 486
  • [29] Watersheds for Semi-Supervised Classification
    Challa, Aditya
    Danda, Sravan
    Sagar, B. S. Daya
    Najman, Laurent
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (05) : 720 - 724
  • [30] Semi-supervised classification trees
    Jurica Levatić
    Michelangelo Ceci
    Dragi Kocev
    Sašo Džeroski
    Journal of Intelligent Information Systems, 2017, 49 : 461 - 486