Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing

被引:25
|
作者
Alves, Vinicius M. [1 ]
Auerbach, Scott S. [2 ]
Kleinstreuer, Nicole [3 ]
Rooney, John P. [4 ]
Muratov, Eugene N. [5 ,6 ]
Rusyn, Ivan [7 ]
Tropsha, Alexander [5 ]
Schmitt, Charles [1 ]
机构
[1] NIEHS, Off Data Sci, Div Natl Toxicol Program DNTP, Durham, NC 27560 USA
[2] NIEHS, Toxinformat Grp, Predict Toxicol Branch, DNTP, Durham, NC 27560 USA
[3] NIEHS, Natl Toxicol Program Interagcy Ctr Evaluat Altern, Sci Directors Off, DNTP, Durham, NC 27560 USA
[4] Integrated Lab Syst LLC, Morrisville, NC USA
[5] Univ N Carolina, UNC Eshelman Sch Pharm, Lab Mol Modeling, Chapel Hill, NC 27599 USA
[6] Univ Fed Paraiba, Dept Pharmaceut Sci, Joao Pessoa, Paraiba, Brazil
[7] Texas A&M Univ, Coll Vet Med & Biomed Sci, Dept Vet Integrat Biosci, College Stn, TX USA
来源
ATLA-ALTERNATIVES TO LABORATORY ANIMALS | 2021年 / 49卷 / 03期
关键词
artificial intelligence; data curation; data quality; data reproducibility; QSAR; QSAR; PREDICTION; REPRODUCIBILITY; TOXICOLOGY; TOXICITY; STRATEGY; VERIFY; BEWARE; CHEMBL; TRUST;
D O I
10.1177/02611929211029635
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [31] Laboratory Data as a Potential Source of Bias in Healthcare Artificial Intelligence and Machine Learning Models
    Luu, Hung S.
    ANNALS OF LABORATORY MEDICINE, 2025, 45 (01) : 12 - 21
  • [32] Validating the Generalizability of Ophthalmic Artificial Intelligence Models on Real-World Clinical Data
    Rashidisabet, Homa
    Sethi, Abhishek
    Jindarak, Ponpawee
    Edmonds, James
    Chan, R. V. Paul
    Leiderman, Yannek I.
    Vajaranant, Thasarat Sutabutr
    Yi, Darvin
    TRANSLATIONAL VISION SCIENCE & TECHNOLOGY, 2023, 12 (11):
  • [33] Data ecosystem business models: Value propositions and value capture with Artificial Intelligence of Things
    Toorajipour, Reza
    Oghazi, Pejvak
    Palmie, Maximilian
    INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2024, 78
  • [34] Reliability evaluation of groundwater quality index using data-driven models
    Najafzadeh, Mohammad
    Homaei, Farshad
    Mohamadi, Sedigheh
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2022, 29 (06) : 8174 - 8190
  • [35] Reliability evaluation of groundwater quality index using data-driven models
    Mohammad Najafzadeh
    Farshad Homaei
    Sedigheh Mohamadi
    Environmental Science and Pollution Research, 2022, 29 : 8174 - 8190
  • [36] Assessing the Impact of Temporal Data Aggregation on the Reliability of Predictive Machine Learning Models
    Barhrhouj, Ayah
    Ananou, Bouchra
    Ouladsine, Mustapha
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 481 - 492
  • [37] Developing a Data-Fused Water Quality Index Based on Artificial Intelligence Models to Mitigate Conflicts between GQI and GWQI
    Nadiri, Ata Allah
    Barzegar, Rahim
    Sadeghfam, Sina
    Rostami, Ali Asghar
    WATER, 2022, 14 (19)
  • [38] From Data to Decisions: How Artificial Intelligence Is Revolutionizing Clinical Prediction Models in Plastic Surgery
    Kooi, Kevin
    Talavera, Estefania
    Freundt, Liliane
    Oflazoglu, Kamilcan
    Ritt, Marco J. P. F.
    Eberlin, Kyle R.
    Selles, Ruud W.
    Clemens, Mark W.
    Rakhorst, Hinne A.
    PLASTIC AND RECONSTRUCTIVE SURGERY, 2024, 154 (06) : 1341 - 1352
  • [39] Testing structural equation models: the impact of error variances in the data generating process
    Randi Hammervold
    Ulf Henning Olsson
    Quality & Quantity, 2012, 46 : 1547 - 1570
  • [40] Artificial intelligence models for prediction of monthly rainfall without climatic data for meteorological stations in Ethiopia
    Abebe, Wondmagegn Taye
    Endalie, Demeke
    JOURNAL OF BIG DATA, 2023, 10 (01)