AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline

被引:0
|
作者
Liu, Chao [1 ,2 ]
Wu, Pei [1 ,2 ]
Wu, Xue [2 ]
Zhao, Xia [3 ]
Chen, Fang [3 ]
Cheng, Xiaofang [3 ]
Zhu, Hongmei [1 ,2 ]
Wang, Ou [2 ]
Xu, Mengyang [2 ,4 ]
机构
[1] BGI, Tianjin, Peoples R China
[2] BGI Res, Shenzhen, Peoples R China
[3] MGI Tech, Shenzhen, Peoples R China
[4] BGI Res, Qingdao, Peoples R China
基金
中国国家自然科学基金;
关键词
long reads; bioinformatics; de novo; genome assembly; haplotype; hybrid; LONG; ACCURATE; READS;
D O I
10.3389/fgene.2024.1421565
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Haplotype-resolved genome sequencing of a Gujarati Indian individual
    Jacob O Kitzman
    Alexandra P MacKenzie
    Andrew Adey
    Joseph B Hiatt
    Rupali P Patwardhan
    Peter H Sudmant
    Sarah B Ng
    Can Alkan
    Ruolan Qiu
    Evan E Eichler
    Jay Shendure
    Nature Biotechnology, 2011, 29 : 59 - 63
  • [22] Large structural variations in the haplotype-resolved African cassava genome
    Mansfeld, Ben N.
    Boyher, Adam
    Berry, Jeffrey C.
    Wilson, Mark
    Ou, Shujun
    Polydore, Seth
    Michael, Todd P.
    Fahlgren, Noah
    Bart, Rebecca S.
    PLANT JOURNAL, 2021, 108 (06): : 1830 - 1848
  • [23] A haplotype-resolved genome assembly of Malus domestica 'Red Fuji'
    Peng, Haixu
    Yi, Yating
    Li, Jinrong
    Qing, You
    Zhai, Xuyang
    Deng, Yulin
    Tian, Ji
    Zhang, Jie
    Hu, Yujing
    Qin, Xiaoxiao
    Lu, Yanfen
    Yao, Yuncong
    Wang, Sen
    Zheng, Yi
    SCIENTIFIC DATA, 2024, 11 (01)
  • [24] Targeted, haplotype-resolved resequencing of long segments of the human genome
    Raymond, CK
    Subramanian, S
    Paddock, M
    Qiu, RL
    Deodato, C
    Palmieri, A
    Chang, J
    Radke, T
    Haugen, E
    Kas, A
    Waring, D
    Bovee, D
    Stacy, R
    Kaul, R
    Olson, MV
    GENOMICS, 2005, 86 (06) : 759 - 766
  • [25] A haplotype-resolved draft genome of the European sardine (Sardina pilchardus)
    Louro, Bruno
    De Moro, Gianluca
    Garcia, Carlos
    Cox, Cymon J.
    Verissimo, Ana
    Sabatino, Stephen J.
    Santos, Antonio M.
    Canario, Adelino V. M.
    GIGASCIENCE, 2019, 8 (05):
  • [26] Chromosome-level and haplotype-resolved genome provides insight into the tetraploid hybrid origin of patchouli
    Yanting Shen
    Wanying Li
    Ying Zeng
    Zhipeng Li
    Yiqiong Chen
    Jixiang Zhang
    Hong Zhao
    Lingfang Feng
    Dongming Ma
    Xiaolu Mo
    Puyue Ouyang
    Lili Huang
    Zheng Wang
    Yuannian Jiao
    Hong-bin Wang
    Nature Communications, 13
  • [27] Representing true plant genomes: haplotype-resolved hybrid pepper genome with trio-binning
    Delorean, Emily E.
    Youngblood, Ramey C.
    Simpson, Sheron A.
    Schoonmaker, Ashley N.
    Scheffler, Brian E.
    Rutter, William B.
    Hulse-Kemp, Amanda M.
    FRONTIERS IN PLANT SCIENCE, 2023, 14
  • [28] Haplotype-resolved genome assembly of the upas tree (Antiaris toxicaria)
    Miao, Ke
    Wang, Ya
    Hou, Luxiao
    Liu, Yan
    Liu, Haiyang
    Ji, Yunheng
    SCIENTIFIC DATA, 2024, 11 (01)
  • [29] Erratum: Haplotype-resolved genome sequencing of a Gujarati Indian individual
    Jacob O Kitzman
    Alexandra P MacKenzie
    Andrew Adey
    Joseph B Hiatt
    Rupali P Patwardhan
    Peter H Sudmant
    Sarah B Ng
    Can Alkan
    Ruolan Qiu
    Evan E Eichler
    Jay Shendure
    Nature Biotechnology, 2011, 29 : 459 - 459
  • [30] Chromosome-level and haplotype-resolved genome provides insight into the tetraploid hybrid origin of patchouli
    Shen, Yanting
    Li, Wanying
    Zeng, Ying
    Li, Zhipeng
    Chen, Yiqiong
    Zhang, Jixiang
    Zhao, Hong
    Feng, Lingfang
    Ma, Dongming
    Mo, Xiaolu
    Ouyang, Puyue
    Huang, Lili
    Wang, Zheng
    Jiao, Yuannian
    Wang, Hong-bin
    NATURE COMMUNICATIONS, 2022, 13 (01)