Cautious Classification with Data Missing Not at Random Using Generative Random Forests

被引:0
|
作者
Llerena, Julissa Villanueva [1 ]
Maua, Denis Deratani [1 ]
Antonucci, Alessandro [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Sao Paulo, Brazil
[2] Dalle Molle Inst Artificial Intelligence, Lugano, Switzerland
关键词
Probabilistic circuits; Generative random forests; Missing data; Conservative inference rule; CREDAL NETWORKS; INFERENCE;
D O I
10.1007/978-3-030-86772-0_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data present a challenge for most machine learning approaches. When a generative probabilistic model of the data is available, an effective approach is to marginalize missing values out. Probabilistic circuits are expressive generative models that allow for efficient exact inference. However, data is often missing not at random, and marginalization can lead to overconfident and wrong conclusions. In this work, we develop an efficient algorithm for assessing the robustness of classifications made by probabilistic circuits to imputations of the non-ignorable portion of missing data at prediction time. We show that our algorithm is exact when the model satisfies certain constraints, which is the case for the recent proposed Generative Random Forests, that equip Random Forest Classifiers with a full probabilistic model of the data. We also show how to extend our approach to handle non-ignorable missing data at training time.
引用
收藏
页码:284 / 298
页数:15
相关论文
共 50 条
  • [21] Sequential Imputation of Missing Spatio-Temporal Precipitation Data Using Random Forests
    Mital, Utkarsh
    Dwivedi, Dipankar
    Brown, James B.
    Faybishenko, Boris
    Painter, Scott L.
    Steefel, Carl I.
    FRONTIERS IN WATER, 2020, 2
  • [22] Investigations into Missing Values Imputation Using Random Forests for Semi-supervised Data
    Ishioka, Tsunenori
    16TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS 2014), 2014, : 296 - 301
  • [23] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    AMERICAN JOURNAL OF OPHTHALMOLOGY, 2022, 162 (01) : 138 - 139
  • [24] Missing data, part 2. Missing data mechanisms: Missing completely at random, missing at random, missing not at random, and why they matter
    Tra My Pham
    Pandis, Nikolaos
    White, Ian R.
    AMERICAN JOURNAL OF ORTHODONTICS AND DENTOFACIAL ORTHOPEDICS, 2022, 162 (01) : 138 - 139
  • [25] Adaptive random forests for evolving data stream classification
    Gomes, Heitor M.
    Bifet, Albert
    Read, Jesse
    Barddal, Jean Paul
    Enembreck, Fabricio
    Pfharinger, Bernhard
    Holmes, Geoff
    Abdessalem, Talel
    MACHINE LEARNING, 2017, 106 (9-10) : 1469 - 1495
  • [26] Adaptive random forests for evolving data stream classification
    Heitor M. Gomes
    Albert Bifet
    Jesse Read
    Jean Paul Barddal
    Fabrício Enembreck
    Bernhard Pfharinger
    Geoff Holmes
    Talel Abdessalem
    Machine Learning, 2017, 106 : 1469 - 1495
  • [27] Random multiclass classification Generalizing random forests to random MNL and random NB
    Prinzie, Anita
    Van den Poel, Dirk
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2007, 4653 : 349 - +
  • [28] Data Calibration Based on Multisensor Using Classification Analysis: A Random Forests Approach
    Xing, Xue
    Yu, Dexin
    Zhang, Wei
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [29] ENSEMBLE DIVERSITY ANALYSIS ON REMOTE SENSING DATA CLASSIFICATION USING RANDOM FORESTS
    Boukir, Samia
    Mellor, Andrew
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 1302 - 1306
  • [30] Oxides Classification with Random Forests
    Xiao, Kai
    Chen, Baitong
    Bao, Wenzheng
    Cheng, Honglin
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 680 - 686