StratoMod: predicting sequencing and variant calling errors with interpretable machine learning

被引:0
|
作者
Nathan Dwarshuis [1 ]
Peter Tonner [1 ]
Nathan D. Olson [1 ]
Fritz J. Sedlazeck [2 ]
Justin Wagner [3 ]
Justin M. Zook [1 ]
机构
[1] National Institute of Standards and Technology,Material Measurement Laboratory
[2] Baylor College of Medicine,Human Genome Sequencing Center
[3] Rice University,Department of Computer Science
关键词
D O I
10.1038/s42003-024-06981-1
中图分类号
学科分类号
摘要
Despite the variety in sequencing platforms, mappers, and variant callers, no single pipeline is optimal across the entire human genome. Therefore, developers, clinicians, and researchers need to make tradeoffs when designing pipelines for their application. Currently, assessing such tradeoffs relies on intuition about how a certain pipeline will perform in a given genomic context. We present StratoMod, which addresses this problem using an interpretable machine-learning classifier to predict germline variant calling errors in a data-driven manner. We show StratoMod can precisely predict recall using Hifi or Illumina and leverage StratoMod’s interpretability to measure contributions from difficult-to-map and homopolymer regions for each respective outcome. Furthermore, we use Statomod to assess the effect of mismapping on predicted recall using linear vs. graph-based references, and identify the hard-to-map regions where graph-based methods excelled and by how much. For these we utilize our draft benchmark based on the Q100 HG002 assembly, which contains previously-inaccessible difficult regions. Furthermore, StratoMod presents a new method of predicting clinically relevant variants likely to be missed, which is an improvement over current pipelines which only filter variants likely to be false. We anticipate this being useful for performing precise risk-reward analyses when designing variant calling pipelines.
引用
收藏
相关论文
共 50 条
  • [1] Predicting Alzheimer's Disease with Interpretable Machine Learning
    Jia, Maoni
    Wu, Yafei
    Xiang, Chaoyi
    Fang, Ya
    DEMENTIA AND GERIATRIC COGNITIVE DISORDERS, 2023, 52 (04) : 249 - 257
  • [2] Predicting systemic financial risk with interpretable machine learning
    Tang, Pan
    Tang, Tiantian
    Lu, Chennuo
    NORTH AMERICAN JOURNAL OF ECONOMICS AND FINANCE, 2024, 71
  • [3] Interpretable machine learning for predicting the fate and transport of pentachlorophenol in groundwater
    Rad, Mehran
    Abtahi, Azra
    Berndtsson, Ronny
    Mcknight, Ursula S.
    Aminifar, Amir
    ENVIRONMENTAL POLLUTION, 2024, 345
  • [4] Predicting Hurricane Evacuation Decisions with Interpretable Machine Learning Methods
    Yuran Sun
    Shih-Kai Huang
    Xilei Zhao
    International Journal of Disaster Risk Science, 2024, 15 : 134 - 148
  • [5] Predicting and understanding residential water use with interpretable machine learning
    Rachunok, Benjamin
    Verma, Aniket
    Fletcher, Sarah
    ENVIRONMENTAL RESEARCH LETTERS, 2024, 19 (01)
  • [6] Predicting the evolution of scientific communities by interpretable machine learning approaches
    Tian, Yunpei
    Li, Gang
    Mao, Jin
    JOURNAL OF INFORMETRICS, 2023, 17 (02)
  • [7] Predicting Hurricane Evacuation Decisions with Interpretable Machine Learning Methods
    Yuran Sun
    Shih-Kai Huang
    Xilei Zhao
    International Journal of Disaster Risk Science, 2024, 15 (01) : 134 - 148
  • [8] Predicting Hurricane Evacuation Decisions with Interpretable Machine Learning Methods
    Sun, Yuran
    Huang, Shih-Kai
    Zhao, Xilei
    INTERNATIONAL JOURNAL OF DISASTER RISK SCIENCE, 2024, 15 (01) : 134 - 148
  • [9] Interpretable Machine Learning
    Chen V.
    Li J.
    Kim J.S.
    Plumb G.
    Talwalkar A.
    Queue, 2021, 19 (06): : 28 - 56
  • [10] Interpretable machine learning for predicting chronic kidney disease progression risk
    Zheng, Jin-Xin
    Li, Xin
    Zhu, Jiang
    Guan, Shi-Yang
    Zhang, Shun-Xian
    Wang, Wei-Ming
    DIGITAL HEALTH, 2024, 10