Cautious Classification with Data Missing Not at Random Using Generative Random Forests

被引:0
|
作者
Llerena, Julissa Villanueva [1 ]
Maua, Denis Deratani [1 ]
Antonucci, Alessandro [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Sao Paulo, Brazil
[2] Dalle Molle Inst Artificial Intelligence, Lugano, Switzerland
关键词
Probabilistic circuits; Generative random forests; Missing data; Conservative inference rule; CREDAL NETWORKS; INFERENCE;
D O I
10.1007/978-3-030-86772-0_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data present a challenge for most machine learning approaches. When a generative probabilistic model of the data is available, an effective approach is to marginalize missing values out. Probabilistic circuits are expressive generative models that allow for efficient exact inference. However, data is often missing not at random, and marginalization can lead to overconfident and wrong conclusions. In this work, we develop an efficient algorithm for assessing the robustness of classifications made by probabilistic circuits to imputations of the non-ignorable portion of missing data at prediction time. We show that our algorithm is exact when the model satisfies certain constraints, which is the case for the recent proposed Generative Random Forests, that equip Random Forest Classifiers with a full probabilistic model of the data. We also show how to extend our approach to handle non-ignorable missing data at training time.
引用
收藏
页码:284 / 298
页数:15
相关论文
共 50 条
  • [31] Random forests for classification in ecology
    Cutler, D. Richard
    Edwards, Thomas C., Jr.
    Beard, Karen H.
    Cutler, Adele
    Hess, Kyle T.
    ECOLOGY, 2007, 88 (11) : 2783 - 2792
  • [32] Classification and interaction in random forests
    Denisko, Danielle
    Hoffman, Michael M.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2018, 115 (08) : 1690 - 1692
  • [33] Random forests for multiclass classification: Random MultiNomial Logit
    Prinzie, Anita
    Van den Poel, Dirk
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (03) : 1721 - 1732
  • [34] Regression with missing data, a comparison study of techniques based on random forests
    Gomez-Mendez, Irving
    Joly, Emilien
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2023, 93 (12) : 1924 - 1949
  • [35] Segmentation of PMSE Data Using Random Forests
    Jozwicki, Dorota
    Sharma, Puneet
    Mann, Ingrid
    Hoppe, Ulf-Peter
    REMOTE SENSING, 2022, 14 (13)
  • [36] Classification of Linear Structures in Mammograms Using Random Forests
    Chen, Zezhi
    Berks, Michael
    Astley, Susan
    Taylor, Chris
    DIGITAL MAMMOGRAPHY, 2010, 6136 : 153 - 160
  • [37] Web Document Classification by Keywords Using Random Forests
    Klassen, Myungsook
    Paturi, Nikhila
    NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 : 256 - 261
  • [38] Baker's Cyst Classification Using Random Forests
    Ciszkiewicz, Adam
    Milewski, Grzegorz
    Lorkowski, Jacek
    PROCEEDINGS OF THE 2018 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2018, : 97 - 100
  • [39] Pathway analysis using random forests classification and regression
    Pang, Herbert
    Lin, Aiping
    Holford, Matthew
    Enerson, Bradley E.
    Lu, Bin
    Lawton, Michael P.
    Floyd, Eugenia
    Zhao, Hongyu
    BIOINFORMATICS, 2006, 22 (16) : 2028 - 2036
  • [40] Classification of Immunosignature Using Random Forests for Cancer Diagnosis
    Zarzar, Mouayad
    Razak, Eliza
    Htike, Zaw Zaw
    Yusof, Faridah
    ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3449 - 3452