NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data

被引:4
|
作者
Huang, Neng [1 ,2 ]
Xu, Minghua [1 ,2 ]
Nie, Fan [1 ,2 ]
Ni, Peng [1 ,2 ]
Xiao, Chuan-Le [3 ]
Luo, Feng [4 ]
Wang, Jianxin [1 ,2 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
[2] Cent South Univ, Hunan Prov Key Lab Bioinformat, Changsha 410083, Peoples R China
[3] Sun Yat Sen Univ, Zhongshan Ophthalm Ctr, State Key Lab Ophthalmol, Guangzhou 510060, Peoples R China
[4] Clemson Univ, Sch Comp, Clemson, SC 29634 USA
基金
美国食品与农业研究所; 美国国家科学基金会; 中国国家自然科学基金;
关键词
LONG; DISCOVERY; GENOME;
D O I
10.1093/bioinformatics/btac824
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Oxford Nanopore sequencing has great potential and advantages in population-scale studies. Due to the cost of sequencing, the depth of whole-genome sequencing for per individual sample must be small. However, the existing single nucleotide polymorphism (SNP) callers are aimed at high-coverage Nanopore sequencing reads. Detecting the SNP variants on low-coverage Nanopore sequencing data is still a challenging problem.Results: We developed a novel deep learning-based SNP calling method, NanoSNP, to identify the SNP sites (excluding short indels) based on low-coverage Nanopore sequencing reads. In this method, we design a multi-step, multi-scale and haplotype-aware SNP detection pipeline. First, the pileup model in NanoSNP utilizes the naive pileup feature to predict a subset of SNP sites with a Bi-long short-term memory (LSTM) network. These SNP sites are phased and used to divide the low-coverage Nanopore reads into different haplotypes. Finally, the long-range haplotype feature and short-range pileup feature are extracted from each haplotype. The haplotype model combines two features and predicts the genotype for the candidate site using a Bi-LSTM network. To evaluate the performance of NanoSNP, we compared NanoSNP with Clair, Clair3, Pepper-DeepVariant and NanoCaller on the low-coverage (similar to 16x) Nanopore sequencing reads. We also performed cross-genome testing on six human genomes HG002-HG007, respectively. Comprehensive experiments demonstrate that NanoSNP outperforms Clair, Pepper-DeepVariant and NanoCaller in identifying SNPs on low-coverage Nanopore sequencing data, including the difficult to-map regions and major histocompatibility complex regions in the human genome. NanoSNP is comparable to Clair3 when the coverage exceeds 16x.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] HapKled: a haplotype-aware structural variant calling approach for Oxford nanopore sequencing data
    Zhang, Zhendong
    Liu, Yue
    Li, Xin
    Liu, Yadong
    Wang, Yadong
    Jiang, Tao
    [J]. FRONTIERS IN GENETICS, 2024, 15
  • [2] Comparing a few SNP calling algorithms using low-coverage sequencing data
    Xiaoqing Yu
    Shuying Sun
    [J]. BMC Bioinformatics, 14
  • [3] SNP genotyping and parameter estimation in polyploids using low-coverage sequencing data
    Blischak, Paul D.
    Kubatko, Laura S.
    Wolfe, Andrea D.
    [J]. BIOINFORMATICS, 2018, 34 (03) : 407 - 415
  • [4] Comparing a few SNP calling algorithms using low-coverage sequencing data
    Yu, Xiaoqing
    Sun, Shuying
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [5] Genomic prediction using low-coverage portable Nanopore sequencing
    Lamb, Harrison J.
    Hayes, Ben J.
    Randhawa, Imtiaz A. S.
    Nguyen, Loan T.
    Ross, Elizabeth M.
    [J]. PLOS ONE, 2021, 16 (12):
  • [6] SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples
    Le, Si Quang
    Durbin, Richard
    [J]. GENOME RESEARCH, 2011, 21 (06) : 952 - 960
  • [7] Genotype and Haplotype Reconstruction from Low-Coverage Short Sequencing Reads
    Mandoiu, Ion
    [J]. BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2009, 5462 : 52 - 53
  • [8] NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data
    Li Fang
    Jiang Hu
    Depeng Wang
    Kai Wang
    [J]. BMC Bioinformatics, 19
  • [9] NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data
    Fang, Li
    Hu, Jiang
    Wang, Depeng
    Wang, Kai
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [10] Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data
    Deng, Tianyu
    Zhang, Pengfei
    Garrick, Dorian
    Gao, Huijiang
    Wang, Lixian
    Zhao, Fuping
    [J]. FRONTIERS IN GENETICS, 2022, 12