Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing

被引:25
|
作者
Alves, Vinicius M. [1 ]
Auerbach, Scott S. [2 ]
Kleinstreuer, Nicole [3 ]
Rooney, John P. [4 ]
Muratov, Eugene N. [5 ,6 ]
Rusyn, Ivan [7 ]
Tropsha, Alexander [5 ]
Schmitt, Charles [1 ]
机构
[1] NIEHS, Off Data Sci, Div Natl Toxicol Program DNTP, Durham, NC 27560 USA
[2] NIEHS, Toxinformat Grp, Predict Toxicol Branch, DNTP, Durham, NC 27560 USA
[3] NIEHS, Natl Toxicol Program Interagcy Ctr Evaluat Altern, Sci Directors Off, DNTP, Durham, NC 27560 USA
[4] Integrated Lab Syst LLC, Morrisville, NC USA
[5] Univ N Carolina, UNC Eshelman Sch Pharm, Lab Mol Modeling, Chapel Hill, NC 27599 USA
[6] Univ Fed Paraiba, Dept Pharmaceut Sci, Joao Pessoa, Paraiba, Brazil
[7] Texas A&M Univ, Coll Vet Med & Biomed Sci, Dept Vet Integrat Biosci, College Stn, TX USA
来源
ATLA-ALTERNATIVES TO LABORATORY ANIMALS | 2021年 / 49卷 / 03期
关键词
artificial intelligence; data curation; data quality; data reproducibility; QSAR; QSAR; PREDICTION; REPRODUCIBILITY; TOXICOLOGY; TOXICITY; STRATEGY; VERIFY; BEWARE; CHEMBL; TRUST;
D O I
10.1177/02611929211029635
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [21] Garbage in, toxic data out: a proposal for ethical artificial intelligence sustainability impact statements
    Ronny Bogani
    Andreas Theodorou
    Luca Arnaboldi
    Robert H. Wortham
    AI and Ethics, 2023, 3 (4): : 1135 - 1142
  • [22] Bias and Cyberbullying Detection and Data Generation Using Transformer Artificial Intelligence Models and Top Large Language Models
    Kumar, Yulia
    Huang, Kuan
    Perez, Angelo
    Yang, Guohao
    Li, J. Jenny
    Morreale, Patricia
    Kruger, Dov
    Jiang, Raymond
    ELECTRONICS, 2024, 13 (17)
  • [23] Using high throughput experimental data and in silico models to discover alternatives to toxic chromate corrosion inhibitors
    Winkler, D. A.
    Breedon, M.
    White, P.
    Hughes, A. E.
    Sapper, E. D.
    Cole, I.
    CORROSION SCIENCE, 2016, 106 : 229 - 235
  • [24] Impact of data quality assessment on development of clinical predictive models
    Jonnagaddala, Jitendra
    Liaw, Siaw-Teng
    Ray, Pradeep
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1069 - 1069
  • [25] Approximation of reliability for multiple-trait animal models with missing data by canonical transformation
    Gengler, N
    Misztal, I
    JOURNAL OF DAIRY SCIENCE, 1996, 79 (02) : 317 - 328
  • [26] Artificial Intelligence Algorithms Based on Data-driven and Knowledge-guided Models
    Jin, Zhe
    Zhang, Yin
    Wu, Fei
    Zhu, Wenwu
    Pan, Yunhe
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (07) : 2580 - 2594
  • [27] Multiple Types of Missing Precipitation Data Filling Based on Ensemble Artificial Intelligence Models
    Qiu, He
    Chen, Hao
    Xu, Bingjiao
    Liu, Gaozhan
    Huang, Saihua
    Nie, Hui
    Xie, Huawei
    WATER, 2024, 16 (22)
  • [28] Characterization of Synthetic Health Data Using Rule-Based Artificial Intelligence Models
    Lenatti, Marta
    Paglialonga, Alessia
    Orani, Vanessa
    Ferretti, Melissa
    Mongelli, Maurizio
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (08) : 3760 - 3769
  • [29] Considering the Secondary Use of Clinical and Educational Data to Facilitate the Development of Artificial Intelligence Models
    Thoma, Brent
    Spadafore, Maxwell
    Sebok-Syer, Stefanie S.
    George, Brian C.
    Chan, Teresa M.
    Krumm, Andrew E.
    ACADEMIC MEDICINE, 2024, 99 (4S) : S77 - S83
  • [30] Drought prediction using artificial intelligence models based on climate data and soil moisture
    Oyounalsoud, Mhamd Saifaldeen
    Yilmaz, Abdullah Gokhan
    Abdallah, Mohamed
    Abdeljaber, Abdulrahman
    SCIENTIFIC REPORTS, 2024, 14 (01):