Reliability evaluation of individual predictions: a data-centric approach

被引:0
|
作者
Shahbazi, Nima [1 ]
Asudeh, Abolfazl [1 ]
机构
[1] Univ Illinois, Chicago, IL 60607 USA
来源
VLDB JOURNAL | 2024年 / 33卷 / 04期
基金
美国国家科学基金会;
关键词
VORONOI DIAGRAMS;
D O I
10.1007/s00778-024-00857-w
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning models only provide probabilistic guarantees on the expected loss of random samples from the distribution represented by their training data. As a result, a model with high accuracy, may or may not be reliable for predicting an individual query point. To address this issue, XAI aims to provide explanations of individual predictions, while approaches such as conformal predictions, probabilistic predictions, and prediction intervals count on the model's certainty in its prediction to identify unreliable cases. Conversely, instead of relying on the model itself, we look for insights in the training data. That is, following the fact a model's performance is limited to the data it has been trained on, we ask "is a model trained on a given data set, fit for making a specific prediction?". Specifically, we argue that a model's prediction is not reliable if (i) there were not enough similar instances in the training set to the query point, and (ii) if there is a high fluctuation (uncertainty) in the vicinity of the query point in the training set. Using these two observations, we propose data-centric reliability measures for individual predictions and develop novel algorithms for efficient and effective computation of the reliability measures during inference time. The proposed algorithms learn the necessary components of the measures from the data itself and are sublinear, which makes them scalable to very large and multi-dimensional settings. Furthermore, an estimator is designed to enable no-data access during the inference time. We conduct extensive experiments using multiple real and synthetic data sets and different tasks, which reflect a consistent correlation between distrust values and model performance.
引用
收藏
页码:1203 / 1230
页数:28
相关论文
共 50 条
  • [31] Challenges of Information Retrieval and Evaluation in Data-Centric Biology
    Yu, Yi-Kuo
    [J]. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2011, 15 (04) : 239 - 240
  • [32] Identification of the Barriers to Data-Centric Approach in the Construction Industry
    Karji, Ali
    Messner, John
    Leicht, Robert
    McComb, Christopher
    [J]. CONSTRUCTION RESEARCH CONGRESS 2022: PROJECT MANAGEMENT AND DELIVERY, CONTRACTS, AND DESIGN AND MATERIALS, 2022, : 1002 - 1011
  • [33] Dynamic Load Balancing in Cloud A Data-Centric Approach
    Dasoriya, Rayan
    Kotadiya, Purvi
    Arya, Garima
    Nayak, Priyanshu
    Mistry, Kamal
    [J]. 2017 INTERNATIONAL CONFERENCE ON NETWORKS & ADVANCES IN COMPUTATIONAL TECHNOLOGIES (NETACT), 2017, : 162 - 166
  • [34] Improving Color Mixture Predictions in Ceramics using Data-centric Deep Learning
    Souper, Tomas
    Morgado, Ana C.
    Marques, Ana
    Silva, Ines
    Rosado, Luis
    [J]. PROCEEDINGS OF 2023 8TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING TECHNOLOGIES, ICMLT 2023, 2023, : 221 - 229
  • [35] Materials data science using CRADLE: A distributed, data-centric approach
    Ciardi, Thomas G.
    Nihar, Arafath
    Chawla, Rounak
    Akanbi, Olatunde
    Tripathi, Pawan K.
    Wu, Yinghui
    Chaudhary, Vipin
    French, Roger H.
    [J]. MRS COMMUNICATIONS, 2024, 14 (04) : 601 - 611
  • [36] Have data, will travel: A data-centric approach to enterprise systems development
    Zumbado, J
    Iller, W
    Naecker, PA
    [J]. CONFERENCE XXII - GEOSPATIAL INFORMATION & TECHNOLOGY ASSOCIATION, PROCEEDINGS, 1999, : 121 - 131
  • [37] Data-centric automated data mining
    Campos, MM
    Stengard, PJ
    Milenova, BL
    [J]. ICMLA 2005: FOURTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2005, : 97 - 104
  • [38] A Data Mesh Approach for Enabling Data-Centric Applications at the Tactical Edge
    Dahdal, Simon
    Poltronieri, Filippo
    Tortonesi, Mauro
    Stefanelli, Cesare
    Suri, Niranjan
    [J]. 2023 INTERNATIONAL CONFERENCE ON MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS, ICMCIS, 2023,
  • [39] RDF Data-Centric Storage
    Levandoski, Justin J.
    Mokbel, Mohamed F.
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, VOLS 1 AND 2, 2009, : 911 - 918
  • [40] Unpacking data-centric geotechnics
    Phoon, Kok-Kwang
    Ching, Jianye
    Cao, Zijun
    [J]. UNDERGROUND SPACE, 2022, 7 (06) : 967 - 989