Supervised machine learning for microbiomics: Bridging the gap between current and best practices

被引:0
|
作者
Dudek, Natasha Katherine [1 ]
Chakhvadze, Mariami [2 ]
Kobakhidze, Saba [2 ,3 ]
Kantidze, Omar [2 ]
Gankin, Yuriy [1 ]
机构
[1] Quantori, Cambridge, MA 02142 USA
[2] Quantori, Tbilisi, GA USA
[3] Free Univ Tbilisi, Tbilisi, GA USA
来源
关键词
Microbiome; Machine learning; Microbiomics; Bioinformatics; ARTIFICIAL-INTELLIGENCE; HEALTH; DIAGNOSIS; BIAS;
D O I
10.1016/j.mlwa.2024.100607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning (ML) is poised to drive innovations in clinical microbiomics, such as in disease diagnostics and prognostics. However, the successful implementation of ML in these domains necessitates the development of reproducible, interpretable models that meet the rigorous performance standards set by regulatory agencies. This study aims to identify key areas in need of improvement in current ML practices within microbiomics, with a focus on bridging the gap between existing methodologies and the requirements for clinical application. To do so, we analyze 100 peer-reviewed articles from 2021 to 2022. Within this corpus, datasets have a median size of 161.5 samples, with over one-third containing fewer than 100 samples, signaling a high potential for overfitting. Limited demographic data further raises concerns about generalizability and fairness, with 24% of studies omitting participants' country of residence, and attributes like race/ethnicity, education, and income rarely reported (11%, 2%, and 0%, respectively). Methodological issues are also common; for instance, for 86% of studies we could not confidently rule out test set omission and data leakage, suggesting a strong potential for inflated performance estimates across the literature. Reproducibility is a concern, with 78% of studies abstaining from sharing their ML code publicly. Based on this analysis, we provide guidance to avoid common pitfalls that can hinder model performance, generalizability, and trustworthiness. An interactive tutorial on applying ML to microbiomics data accompanies the discussion, to help establish and reinforce best practices within the community.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Best Practices in Supervised Machine Learning: A Tutorial for Psychologists
    Pargent, Florian
    Schoedel, Ramona
    Stachl, Clemens
    ADVANCES IN METHODS AND PRACTICES IN PSYCHOLOGICAL SCIENCE, 2023, 6 (03)
  • [2] ROBERT: Bridging the Gap Between Machine Learning and Chemistry
    Dalmau, David
    Alegre-Requena, Juan V.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE, 2024, 14 (05)
  • [3] Bridging the Gap between Human Knowledge and Machine Learning
    Alvarado-Perez, Juan C.
    Peluffo-Ordonez, Diego H.
    Theron, Roberto
    ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL, 2015, 4 (01): : 54 - 64
  • [4] Towards Bridging the Gap between Machine Learning Researchers and Practitioners
    Assem, Haytham
    O'Sullivan, Declan
    2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 702 - 708
  • [5] The role of lifelong machine learning in bridging the gap between human and machine learning: A scientometric analysis
    Abulaish, Muhammad
    Wasi, Nesar Ahmad
    Sharma, Shachi
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2024, 14 (02)
  • [6] Bridging the gap between prostate radiology and pathology through machine learning
    Bhattacharya, Indrani
    Lim, David S.
    Aung, Han Lin
    Liu, Xingchen
    Seetharaman, Arun
    Kunder, Christian A.
    Shao, Wei
    Soerensen, Simon J. C.
    Fan, Richard E.
    Ghanouni, Pejman
    To'o, Katherine J.
    Brooks, James D.
    Sonn, Geoffrey A.
    Rusu, Mirabela
    MEDICAL PHYSICS, 2022, 49 (08) : 5160 - 5181
  • [7] ClinicalomicsDB - Bridging the gap between clinical omics data and machine learning
    Moon, Chang In
    Jia, Byron
    Zhang, Bing
    CANCER RESEARCH, 2023, 83 (05)
  • [8] Bridging the gap between mechanistic biological models and machine learning surrogates
    Gherman, Ioana M.
    Abdallah, Zahraa S.
    Pang, Wei
    Gorochowski, Thomas E.
    Grierson, Claire S.
    Marucci, Lucia
    PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (04)
  • [9] Editorial: Understanding and Bridging the Gap Between Neuromorphic Computing and Machine Learning
    Deng, Lei
    Tang, Huajin
    Roy, Kaushik
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15
  • [10] Addressing Threats to Validity in Supervised Machine Learning: A Framework and Best Practices for Education Researchers
    Anglin, Kylie
    AERA OPEN, 2024, 10