Cautious Classification with Data Missing Not at Random Using Generative Random Forests

被引:0
|
作者
Llerena, Julissa Villanueva [1 ]
Maua, Denis Deratani [1 ]
Antonucci, Alessandro [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Sao Paulo, Brazil
[2] Dalle Molle Inst Artificial Intelligence, Lugano, Switzerland
关键词
Probabilistic circuits; Generative random forests; Missing data; Conservative inference rule; CREDAL NETWORKS; INFERENCE;
D O I
10.1007/978-3-030-86772-0_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data present a challenge for most machine learning approaches. When a generative probabilistic model of the data is available, an effective approach is to marginalize missing values out. Probabilistic circuits are expressive generative models that allow for efficient exact inference. However, data is often missing not at random, and marginalization can lead to overconfident and wrong conclusions. In this work, we develop an efficient algorithm for assessing the robustness of classifications made by probabilistic circuits to imputations of the non-ignorable portion of missing data at prediction time. We show that our algorithm is exact when the model satisfies certain constraints, which is the case for the recent proposed Generative Random Forests, that equip Random Forest Classifiers with a full probabilistic model of the data. We also show how to extend our approach to handle non-ignorable missing data at training time.
引用
收藏
页码:284 / 298
页数:15
相关论文
共 50 条
  • [41] Cautious Random Forests: a New Decision Strategy and some Experiments
    Zhang, Haifei
    Quost, Benjamin
    Masson, Marie-Helene
    PROCEEDINGS OF THE TWELVETH INTERNATIONAL SYMPOSIUM ON IMPRECISE PROBABILITY: THEORIES AND APPLICATIONS, 2021, 147 : 369 - 372
  • [42] An efficient random forests algorithm for high dimensional data classification
    Qiang Wang
    Thanh-Tung Nguyen
    Joshua Z. Huang
    Thuy Thi Nguyen
    Advances in Data Analysis and Classification, 2018, 12 : 953 - 972
  • [43] An efficient random forests algorithm for high dimensional data classification
    Wang, Qiang
    Thanh-Tung Nguyen
    Huang, Joshua Z.
    Thuy Thi Nguyen
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (04) : 953 - 972
  • [44] Correction to: Adaptive random forests for evolving data stream classification
    Heitor M. Gomes
    Albert Bifet
    Jesse Read
    Jean Paul Barddal
    Fabrício Enembreck
    Bernhard Pfahringer
    Geoff Holmes
    Talel Abdessalem
    Machine Learning, 2019, 108 : 1877 - 1878
  • [45] Data mining with Random Forests as a methodology for biomedical signal classification
    Proniewska, Klaudia
    BIO-ALGORITHMS AND MED-SYSTEMS, 2016, 12 (02) : 89 - 92
  • [46] Sleep classification from wrist-worn accelerometer data using random forests
    Sundararajan, Kalaivani
    Georgievska, Sonja
    te Lindert, Bart H. W.
    Gehrman, Philip R.
    Ramautar, Jennifer
    Mazzotti, Diego R.
    Sabia, Severine
    Weedon, Michael N.
    van Someren, Eus J. W.
    Ridder, Lars
    Wang, Jian
    van Hees, Vincent T.
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [47] Classification of sensor independent point cloud data of building objects using random forests
    Bassier, Maarten
    Van Genechten, Bjorn
    Vergauwen, Maarten
    JOURNAL OF BUILDING ENGINEERING, 2019, 21 : 468 - 477
  • [48] Sleep classification from wrist-worn accelerometer data using random forests
    Kalaivani Sundararajan
    Sonja Georgievska
    Bart H. W. te Lindert
    Philip R. Gehrman
    Jennifer Ramautar
    Diego R. Mazzotti
    Séverine Sabia
    Michael N. Weedon
    Eus J. W. van Someren
    Lars Ridder
    Jian Wang
    Vincent T. van Hees
    Scientific Reports, 11
  • [49] Improvement of rainfall estimation from MSG data using Random Forests classification and regression
    Ouallouche, Fethi
    Lazri, Mourad
    Ameur, Soltane
    ATMOSPHERIC RESEARCH, 2018, 211 : 62 - 72
  • [50] Data Missing Not At Random Foreword
    Kim, Jae-Kwang
    Ying, Zhiliang
    STATISTICA SINICA, 2018, 28 (04) : 1651 - 1652