StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

被引:1
|
作者
Dwarshuis, Nathan [1 ]
Tonner, Peter [1 ]
Olson, Nathan D. [1 ]
Sedlazeck, Fritz J. [2 ,3 ]
Wagner, Justin [1 ]
Zook, Justin M. [1 ]
机构
[1] NIST, Mat Measurement Lab, Gaithersburg, MD 20899 USA
[2] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX USA
[3] Rice Univ, Dept Comp Sci, Houston, TX USA
关键词
D O I
10.1038/s42003-024-06981-1
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Despite the variety in sequencing platforms, mappers, and variant callers, no single pipeline is optimal across the entire human genome. Therefore, developers, clinicians, and researchers need to make tradeoffs when designing pipelines for their application. Currently, assessing such tradeoffs relies on intuition about how a certain pipeline will perform in a given genomic context. We present StratoMod, which addresses this problem using an interpretable machine-learning classifier to predict germline variant calling errors in a data-driven manner. We show StratoMod can precisely predict recall using Hifi or Illumina and leverage StratoMod's interpretability to measure contributions from difficult-to-map and homopolymer regions for each respective outcome. Furthermore, we use Statomod to assess the effect of mismapping on predicted recall using linear vs. graph-based references, and identify the hard-to-map regions where graph-based methods excelled and by how much. For these we utilize our draft benchmark based on the Q100 HG002 assembly, which contains previously-inaccessible difficult regions. Furthermore, StratoMod presents a new method of predicting clinically relevant variants likely to be missed, which is an improvement over current pipelines which only filter variants likely to be false. We anticipate this being useful for performing precise risk-reward analyses when designing variant calling pipelines. StratoMod is a tool to predict variant calling difficulty given a genomic context for specified sequencing platforms.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Interpretable machine-learning models for predicting creep recovery of concrete
    Mei, Shengqi
    Liu, Xiaodong
    Wang, Xingju
    Li, Xufeng
    STRUCTURAL CONCRETE, 2024,
  • [22] PREDICTING UNPLANNED TRAUMA ICU ADMISSIONS USING INTERPRETABLE MACHINE LEARNING
    Zander, Tyler
    Grimsley, Emily
    Kendall, Melissa
    Parikh, Rajavi
    Kuo, Paul
    SHOCK, 2024, 62 (01): : 21 - 22
  • [23] Interpretable Machine Learning Model for Predicting Postpartum Depression: Retrospective Study
    Zhang, Ren
    Liu, Yi
    Zhang, Zhiwei
    Luo, Rui
    Lv, Bin
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [24] Predicting Enemy Threats in Ground Operations Using Interpretable Machine Learning
    Bae, Kyungyeol
    Kim, Dohyun
    MILITARY OPERATIONS RESEARCH, 2024, 29 (04)
  • [25] Interpretable machine learning for predicting evaporation from Awash reservoirs, Ethiopia
    Eshetu, Kidist Demessie
    Alamirew, Tena
    Woldesenbet, Tekalegn Ayele
    EARTH SCIENCE INFORMATICS, 2023, 16 (04) : 3209 - 3226
  • [26] Best practices for variant calling in clinical sequencing
    Koboldt, Daniel C.
    GENOME MEDICINE, 2020, 12 (01)
  • [27] Best practices for variant calling in clinical sequencing
    Daniel C. Koboldt
    Genome Medicine, 12
  • [28] Predicting Risks of Machine Translations of Public Health Resources by Developing Interpretable Machine Learning Classifiers
    Xie, Wenxiu
    Ji, Meng
    Huang, Riliu
    Hao, Tianyong
    Chow, Chi-Yin
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (16)
  • [29] Predicting Cost Impacts of Nonconformances in Construction Projects Using Interpretable Machine Learning
    Koc, Kerim
    Budayan, Cenk
    Ekmekcioglu, Omer
    Tokdemir, Onur Behzat
    JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2024, 150 (01)
  • [30] Understanding and predicting online product return behavior: An interpretable machine learning approach
    Duong, Quang Huy
    Zhou, Li
    Nguyen, Truong Van
    Meng, Meng
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2025, 280