Conformalized Semi-supervised Random Forest for Classification and Abnormality Detection

被引:0
|
作者
Han, Yujin [1 ,4 ]
Xu, Mingwenchan [2 ,4 ]
Guan, Leying [3 ]
机构
[1] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Northwestern Univ, Dept IEMS, Evanston, IL USA
[3] Yale Univ, Dept Biostat, New Haven, CT 06520 USA
[4] Yale Univ, New Haven, CT USA
关键词
PREDICTIVE INFERENCE; COVARIATE SHIFT;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Random Forests classifier, a widely utilized o.-the-shelf classification tool, assumes training and test samples come from the same distribution as other standard classifiers. However, in safety-critical scenarios like medical diagnosis and network attack detection, discrepancies between the training and test sets, including the potential presence of novel outlier samples not appearing during training, can pose significant challenges. To address this problem, we introduce the Conformalized Semi-Supervised Random Forest (CSForest), which couples the conformalization technique Jackknife+aB with semi-supervised tree ensembles to construct a set-valued prediction C(x). Instead of optimizing over the training distribution, CSForest employs unlabeled test samples to enhance accuracy and flag unseen outliers by generating an empty set. Theoretically, we establish CSForest to cover true labels for previously observed inlier classes under arbitrarily label-shift in the test data. We compare CSForest with state-of-the-art methods using synthetic examples and various real-world datasets, under different types of distribution changes in the test domain. Our results highlight CSForest's effective prediction of inliers and its ability to detect outlier samples unique to the test data. In addition, CSForest shows persistently good performance as the sizes of the training and test sets vary. Codes of CSForest are available at https://github.com/yujinhan98/CSForest
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Semi-Supervised Robust Mixture Models in RKHS for Abnormality Detection in Medical Images
    Kumar, Nitin
    Awate, Suyash P.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4772 - 4787
  • [32] Semi-supervised Image Classification Learning Based on Random Feature Subspace
    Liu Li
    Zhang Huaxiang
    Hu Xiaojun
    Sun Feifei
    PATTERN RECOGNITION (CCPR 2014), PT I, 2014, 483 : 237 - 242
  • [33] Semi-Supervised Pattern Classification Using Optimum-Path Forest
    Amorim, Willian P.
    Falcao, Alexandre X.
    Carvalho, Marcelo H.
    2014 27TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2014, : 111 - 118
  • [34] Random Walk in Feature-Sample Networks for Semi-Supervised Classification
    Neto Verri, Filipe Alves
    Zhao, Liang
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 235 - 240
  • [35] Seeded random walk for multi-view semi-supervised classification
    Wang, Shiping
    Wang, Zhewen
    Lim, Kart-Leong
    Xiao, Guobao
    Guo, Wenzhong
    KNOWLEDGE-BASED SYSTEMS, 2021, 222
  • [36] Implementing a Network Intrusion Detection System Using Semi-supervised Support Vector Machine and Random Forest
    Shah, Sandeep
    Muhuri, Pramita Sree
    Yuan, Xiaohong
    Roy, Kaushik
    Chatterjee, Prosenjit
    ACMSE 2021: PROCEEDINGS OF THE 2021 ACM SOUTHEAST CONFERENCE, 2021, : 180 - 184
  • [37] Semi-Supervised Sequence Classification through Change Point Detection
    Ahad, Nauman
    Davenport, Mark A.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6574 - 6581
  • [38] POSTER: Semi-supervised Classification for Dynamic Android Malware Detection
    Chen, Li
    Zhang, Mingwei
    Yang, Chih-yuan
    Sahita, Ravi
    CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, : 2479 - 2481
  • [39] Adrenal Gland Abnormality Detection Using Random Forest Classification
    Saiprasad, Ganesh
    Chang, Chein-I
    Safdar, Nabile
    Saenz, Naomi
    Siegel, Eliot
    JOURNAL OF DIGITAL IMAGING, 2013, 26 (05) : 891 - 897
  • [40] Adrenal Gland Abnormality Detection Using Random Forest Classification
    Ganesh Saiprasad
    Chein-I Chang
    Nabile Safdar
    Naomi Saenz
    Eliot Siegel
    Journal of Digital Imaging, 2013, 26 : 891 - 897