Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer

被引:0
|
作者
John Adeoye
Liuling Hui
Yu-Xiong Su
机构
[1] University of Hong Kong,Division of Oral and Maxillofacial Surgery, Faculty of Dentistry
[2] University of Hong Kong,Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry
来源
关键词
Artificial intelligence; Data-centric AI; Data quality; Head and neck cancer; Machine learning; Review;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning models have been increasingly considered to model head and neck cancer outcomes for improved screening, diagnosis, treatment, and prognostication of the disease. As the concept of data-centric artificial intelligence is still incipient in healthcare systems, little is known about the data quality of the models proposed for clinical utility. This is important as it supports the generalizability of the models and data standardization. Therefore, this study overviews the quality of structured and unstructured data used for machine learning model construction in head and neck cancer. Relevant studies reporting on the use of machine learning models based on structured and unstructured custom datasets between January 2016 and June 2022 were sourced from PubMed, EMBASE, Scopus, and Web of Science electronic databases. Prediction model Risk of Bias Assessment (PROBAST) tool was used to assess the quality of individual studies before comprehensive data quality parameters were assessed according to the type of dataset used for model construction. A total of 159 studies were included in the review; 106 utilized structured datasets while 53 utilized unstructured datasets. Data quality assessments were deliberately performed for 14.2% of structured datasets and 11.3% of unstructured datasets before model construction. Class imbalance and data fairness were the most common limitations in data quality for both types of datasets while outlier detection and lack of representative outcome classes were common in structured and unstructured datasets respectively. Furthermore, this review found that class imbalance reduced the discriminatory performance for models based on structured datasets while higher image resolution and good class overlap resulted in better model performance using unstructured datasets during internal validation. Overall, data quality was infrequently assessed before the construction of ML models in head and neck cancer irrespective of the use of structured or unstructured datasets. To improve model generalizability, the assessments discussed in this study should be introduced during model construction to achieve data-centric intelligent systems for head and neck cancer management.
引用
下载
收藏
相关论文
共 50 条
  • [41] Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review
    Xu, Lu
    Sanders, Leslie
    Li, Kay
    Chow, James C. L.
    JMIR CANCER, 2021, 7 (04):
  • [42] Data-centric explainable artificial intelligence techniques for cyber-attack detection in microgrid networks
    Trivedi, Rohit
    Patra, Sandipan
    Khadem, Shafi
    Energy Reports, 2025, 13 : 217 - 229
  • [43] Empowering engineering with data, machine learning and artificial intelligence: a short introductive review
    Francisco Chinesta
    Elias Cueto
    Advanced Modeling and Simulation in Engineering Sciences, 9
  • [44] Empowering engineering with data, machine learning and artificial intelligence: a short introductive review
    Chinesta, Francisco
    Cueto, Elias
    ADVANCED MODELING AND SIMULATION IN ENGINEERING SCIENCES, 2022, 9 (01)
  • [45] Machine Learning and Artificial Intelligence Improve Data Validation
    Gouge, Brian
    Opflow, 2024, 50 (08) : 8 - 9
  • [46] Performance Evaluation of Data-Centric Networks Based on Parallel and Distributed Architectures for Machine Intelligence Research
    Xie, Linjiang
    Hang, Feilu
    Guo, Wei
    Zhang, Zhenhong
    Li, Hanruo
    PARALLEL PROCESSING LETTERS, 2023, 33 (03)
  • [47] Automatic Defect Classification (ADC) solution using Data-Centric Artificial Intelligence (AI) for outgoing quality inspections in the semiconductor industry
    Anilturk, Onder
    Lumanauw, Edwin
    Bird, James
    Olloniego, Juan
    Laird, Dillon
    Fernandez, Juan Camilo
    Killough, Quinn
    METROLOGY, INSPECTION, AND PROCESS CONTROL XXXVII, 2023, 12496
  • [48] A Machine-Learning-Based Data-Centric Misbehavior Detection Model for Internet of Vehicles
    Sharma, Prinkle
    Liu, Hong
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (06) : 4991 - 4999
  • [49] What Is a Digital Twin? Experimental Design for a Data-Centric Machine Learning Perspective in Health
    Emmert-Streib, Frank
    Yli-Harja, Olli
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (21)
  • [50] Data-centric Engineering: integrating simulation, machine learning and statistics. Challenges and opportunities
    Pan, Indranil
    Mason, Lachlan R.
    Matar, Omar K.
    Chemical Engineering Science, 2022, 249