Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer

被引:0
|
作者
John Adeoye
Liuling Hui
Yu-Xiong Su
机构
[1] University of Hong Kong,Division of Oral and Maxillofacial Surgery, Faculty of Dentistry
[2] University of Hong Kong,Division of Applied Oral Sciences and Community Dental Care, Faculty of Dentistry
来源
关键词
Artificial intelligence; Data-centric AI; Data quality; Head and neck cancer; Machine learning; Review;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning models have been increasingly considered to model head and neck cancer outcomes for improved screening, diagnosis, treatment, and prognostication of the disease. As the concept of data-centric artificial intelligence is still incipient in healthcare systems, little is known about the data quality of the models proposed for clinical utility. This is important as it supports the generalizability of the models and data standardization. Therefore, this study overviews the quality of structured and unstructured data used for machine learning model construction in head and neck cancer. Relevant studies reporting on the use of machine learning models based on structured and unstructured custom datasets between January 2016 and June 2022 were sourced from PubMed, EMBASE, Scopus, and Web of Science electronic databases. Prediction model Risk of Bias Assessment (PROBAST) tool was used to assess the quality of individual studies before comprehensive data quality parameters were assessed according to the type of dataset used for model construction. A total of 159 studies were included in the review; 106 utilized structured datasets while 53 utilized unstructured datasets. Data quality assessments were deliberately performed for 14.2% of structured datasets and 11.3% of unstructured datasets before model construction. Class imbalance and data fairness were the most common limitations in data quality for both types of datasets while outlier detection and lack of representative outcome classes were common in structured and unstructured datasets respectively. Furthermore, this review found that class imbalance reduced the discriminatory performance for models based on structured datasets while higher image resolution and good class overlap resulted in better model performance using unstructured datasets during internal validation. Overall, data quality was infrequently assessed before the construction of ML models in head and neck cancer irrespective of the use of structured or unstructured datasets. To improve model generalizability, the assessments discussed in this study should be introduced during model construction to achieve data-centric intelligent systems for head and neck cancer management.
引用
收藏
相关论文
共 50 条
  • [1] Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
    Adeoye, John
    Hui, Liuling
    Su, Yu-Xiong
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [2] Systematic review of data-centric approaches in artificial intelligence and machine learning
    Singh P.
    [J]. Data Science and Management, 2023, 6 (03): : 144 - 157
  • [3] Data-Centric Artificial Intelligence
    Jakubik, Johannes
    Voessing, Michael
    Kuehl, Niklas
    Walk, Jannis
    Satzger, Gerhard
    [J]. BUSINESS & INFORMATION SYSTEMS ENGINEERING, 2024, 66 (04) : 507 - 515
  • [4] Artificial intelligence and machine learning in head and neck oncology
    Thankappan, Krishnakumar
    [J]. JOURNAL OF HEAD & NECK PHYSICIANS AND SURGEONS, 2022, 10 (02): : 117 - 120
  • [5] Data-Centric Green Artificial Intelligence: A Survey
    Salehi S.
    Schmeink A.
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (05): : 1 - 18
  • [6] Machine learning for data-centric epidemic forecasting
    Rodriguez, Alexander
    Kamarthi, Harshavardhan
    Agarwal, Pulak
    Ho, Javen
    Patel, Mira
    Sapre, Suchet
    Prakash, B. Aditya
    [J]. NATURE MACHINE INTELLIGENCE, 2024, : 1122 - 1131
  • [7] A Data-Centric Optimization Framework for Machine Learning
    Rausch, Oliver
    Ben-Nun, Tal
    Dryden, Nikoli
    Ivanov, Andrei
    Li, Shigang
    Hoefler, Torsten
    [J]. PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,
  • [8] Data-centric approach to improve machine learning models for inorganic materials
    Bartel, Christopher J.
    [J]. PATTERNS, 2021, 2 (11):
  • [9] Technical Analysis of Data-Centric and Model-Centric Artificial Intelligence
    Majeed, Abdul
    Hwang, Seong Oun
    [J]. IT PROFESSIONAL, 2023, 25 (06) : 62 - 70
  • [10] Assessing the Reporting Quality of Machine Learning Algorithms in Head and Neck Oncology
    Alapati, Rahul
    Renslo, Bryan
    Wagoner, Sarah F.
    Karadaghy, Omar
    Serpedin, Aisha
    Kim, Yeo Eun
    Feucht, Maria
    Wang, Naomi
    Ramesh, Uma
    Bon Nieves, Antonio
    Lawrence, Amelia
    Virgen, Celina
    Sawaf, Tuleen
    Rameau, Anais
    Bur, Andres M.
    [J]. LARYNGOSCOPE, 2024,