StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

被引:1
|
作者
Dwarshuis, Nathan [1 ]
Tonner, Peter [1 ]
Olson, Nathan D. [1 ]
Sedlazeck, Fritz J. [2 ,3 ]
Wagner, Justin [1 ]
Zook, Justin M. [1 ]
机构
[1] NIST, Mat Measurement Lab, Gaithersburg, MD 20899 USA
[2] Baylor Coll Med, Human Genome Sequencing Ctr, Houston, TX USA
[3] Rice Univ, Dept Comp Sci, Houston, TX USA
关键词
D O I
10.1038/s42003-024-06981-1
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Despite the variety in sequencing platforms, mappers, and variant callers, no single pipeline is optimal across the entire human genome. Therefore, developers, clinicians, and researchers need to make tradeoffs when designing pipelines for their application. Currently, assessing such tradeoffs relies on intuition about how a certain pipeline will perform in a given genomic context. We present StratoMod, which addresses this problem using an interpretable machine-learning classifier to predict germline variant calling errors in a data-driven manner. We show StratoMod can precisely predict recall using Hifi or Illumina and leverage StratoMod's interpretability to measure contributions from difficult-to-map and homopolymer regions for each respective outcome. Furthermore, we use Statomod to assess the effect of mismapping on predicted recall using linear vs. graph-based references, and identify the hard-to-map regions where graph-based methods excelled and by how much. For these we utilize our draft benchmark based on the Q100 HG002 assembly, which contains previously-inaccessible difficult regions. Furthermore, StratoMod presents a new method of predicting clinically relevant variants likely to be missed, which is an improvement over current pipelines which only filter variants likely to be false. We anticipate this being useful for performing precise risk-reward analyses when designing variant calling pipelines. StratoMod is a tool to predict variant calling difficulty given a genomic context for specified sequencing platforms.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Interpretable machine learning models for predicting and explaining vehicle fuel consumption anomalies
    Barbado, Alberto
    Corcho, Oscar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
  • [42] Development of an Interpretable Machine Learning Model for Predicting Individual Response to Antihypertensive Treatments
    Yi, Jiayi
    Wang, Lili
    Liu, Yanchen
    Liu, Jiamin
    Zhang, Haibo
    Zheng, Xin
    CIRCULATION, 2023, 148
  • [43] Interpretable Ensemble-Machine-Learning models for predicting creep behavior of concrete
    Microlab, Faculty of Civil Engineering and Geosciences, Delft University of Technology, Delft
    CN
    2628, Netherlands
    Cem Concr Compos,
  • [44] Interpretable ensemble machine learning models for predicting the shear capacity of UHPC joints
    Ye, Meng
    Li, Lifeng
    Jin, Weimeng
    Tang, Jiahao
    Yoo, Doo-Yeol
    Zhou, Cong
    ENGINEERING STRUCTURES, 2024, 315
  • [45] Predicting the fundraising performance of environmental crowdfunding projects: An interpretable machine learning approach
    Liu, Zhanyu
    Hu, Saiquan
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (02)
  • [46] Interpretable Ensemble-Machine-Learning models for predicting creep behavior of concrete
    Liang, Minfei
    Chang, Ze
    Wan, Zhi
    Gan, Yidong
    Schlangen, Erik
    Savija, Branko
    CEMENT & CONCRETE COMPOSITES, 2022, 125
  • [47] Reliable variant calling during runtime of Illumina sequencing
    Tobias P. Loka
    Simon H. Tausch
    Bernhard Y. Renard
    Scientific Reports, 9
  • [48] Comparison of Variant Calling Software for Pooled Sequencing Studies
    Szymczak, Silke
    Muller, Stefanie
    Hopfner, Franziska
    Kuhlenbaumer, Gregor
    Krawczak, Michael
    Dempfle, Astrid
    GENETIC EPIDEMIOLOGY, 2016, 40 (07) : 666 - 666
  • [49] Reliable variant calling during runtime of Illumina sequencing
    Loka, Tobias P.
    Tausch, Simon H.
    Renard, Bernhard Y.
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [50] Interpretable machine learning assessment
    Han, Henry
    Wu, Yi
    Wang, Jiacun
    Han, Ashley
    NEUROCOMPUTING, 2023, 561