Cautious Classification with Data Missing Not at Random Using Generative Random Forests

被引:0
|
作者
Llerena, Julissa Villanueva [1 ]
Maua, Denis Deratani [1 ]
Antonucci, Alessandro [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Sao Paulo, Brazil
[2] Dalle Molle Inst Artificial Intelligence, Lugano, Switzerland
关键词
Probabilistic circuits; Generative random forests; Missing data; Conservative inference rule; CREDAL NETWORKS; INFERENCE;
D O I
10.1007/978-3-030-86772-0_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data present a challenge for most machine learning approaches. When a generative probabilistic model of the data is available, an effective approach is to marginalize missing values out. Probabilistic circuits are expressive generative models that allow for efficient exact inference. However, data is often missing not at random, and marginalization can lead to overconfident and wrong conclusions. In this work, we develop an efficient algorithm for assessing the robustness of classifications made by probabilistic circuits to imputations of the non-ignorable portion of missing data at prediction time. We show that our algorithm is exact when the model satisfies certain constraints, which is the case for the recent proposed Generative Random Forests, that equip Random Forest Classifiers with a full probabilistic model of the data. We also show how to extend our approach to handle non-ignorable missing data at training time.
引用
收藏
页码:284 / 298
页数:15
相关论文
共 50 条
  • [1] Cautious weighted random forests
    Zhang, Haifei
    Quost, Benjamin
    Masson, Marie-Helene
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
  • [2] Variable selection by Random Forests using data with missing values
    Hapfelmeier, A.
    Ulm, K.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 80 : 129 - 139
  • [3] Classification of Urban LiDAR data using Conditional Random Field and Random Forests
    Niemeyer, Joachim
    Rottensteiner, Franz
    Soergel, Uwe
    2013 JOINT URBAN REMOTE SENSING EVENT (JURSE), 2013, : 139 - 142
  • [4] Deep Generative Imputation Model for Missing Not At Random Data
    Chen, Jialei
    Xu, Yuanbo
    Wang, Pengyang
    Yang, Yongjian
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 316 - 325
  • [5] Identifiable Generative Models for Missing Not at Random Data Imputation
    Ma, Chao
    Zhang, Cheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [6] Big Genome Data Classification with Random Forests Using VariantSpark
    Devi, A. Shobana
    Maragatham, G.
    INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES (ICCNCT 2018), 2019, 15 : 599 - 614
  • [7] Classification Using Streaming Random Forests
    Abdulsalam, Hanady
    Skillicorn, David B.
    Martin, Patrick
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (01) : 22 - 36
  • [8] Explaining Cautious Random Forests via Counterfactuals
    Zhang, Haifei
    Quost, Benjamin
    Masson, Marie-Helene
    BUILDING BRIDGES BETWEEN SOFT AND STATISTICAL METHODOLOGIES FOR DATA SCIENCE, 2023, 1433 : 390 - 397
  • [9] Random Subspace Sampling for Classification with Missing Data
    Cao, Yun-Hao
    Wu, Jian-Xin
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (02) : 472 - 486
  • [10] Classification of DNA microarray data with random forests
    Stokowy T.
    Advances in Intelligent and Soft Computing, 2010, 69 : 305 - 308