SeedsGraph: an efficient assembler for next-generation sequencing data

被引:2
|
作者
Wang, Chunyu [1 ]
Guo, Maozu [1 ]
Liu, Xiaoyan [1 ]
Liu, Yang [1 ]
Zou, Quan [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, 92 West Dazhi St, Harbin 150001, Peoples R China
[2] Xiamen Univ, Dept Comp Sci, Xiamen 361005, Peoples R China
来源
BMC MEDICAL GENOMICS | 2015年 / 8卷
基金
高等学校博士学科点专项科研基金; 中国国家自然科学基金;
关键词
ALGORITHMS; GENOMES;
D O I
10.1186/1755-8794-8-S2-S13
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Next-generation sequencing
    Reis-Filho, Jorge S.
    [J]. BREAST CANCER RESEARCH, 2009, 11
  • [33] APPLICATIONS OF NEXT-GENERATION SEQUENCING Sequencing technologies - the next generation
    Metzker, Michael L.
    [J]. NATURE REVIEWS GENETICS, 2010, 11 (01) : 31 - 46
  • [34] Discovering genetic polymorphisms in next-generation sequencing data
    Imelfort, Michael
    Duran, Chris
    Batley, Jacqueline
    Edwards, David
    [J]. PLANT BIOTECHNOLOGY JOURNAL, 2009, 7 (04) : 312 - 317
  • [35] Visual programming for next-generation sequencing data analytics
    Franco Milicchio
    Rebecca Rose
    Jiang Bian
    Jae Min
    Mattia Prosperi
    [J]. BioData Mining, 9
  • [36] Predictive Coding of Aligned Next-Generation Sequencing Data
    Voges, Jan
    Munderloh, Marco
    Ostermann, Joern
    [J]. 2016 DATA COMPRESSION CONFERENCE (DCC), 2016, : 241 - 250
  • [37] Visual programming for next-generation sequencing data analytics
    Milicchio, Franco
    Rose, Rebecca
    Bian, Jiang
    Min, Jae
    Prosperi, Mattia
    [J]. BIODATA MINING, 2016, 9
  • [38] Computational classification of microRNAs in next-generation sequencing data
    Riback, Joshua
    Hatzigeorgiou, Artemis G.
    Reczko, Martin
    [J]. THEORETICAL CHEMISTRY ACCOUNTS, 2010, 125 (3-6) : 637 - 642
  • [39] Zseq: An Approach for Preprocessing Next-Generation Sequencing Data
    Alkhateeb, Abedalrhman
    Rueda, Luis
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (08) : 746 - 755
  • [40] Qualimap: evaluating next-generation sequencing alignment data
    Garcia-Alcalde, Fernando
    Okonechnikov, Konstantin
    Carbonell, Jose
    Cruz, Luis M.
    Goetz, Stefan
    Tarazona, Sonia
    Dopazo, Joaquin
    Meyer, Thomas F.
    Conesa, Ana
    [J]. BIOINFORMATICS, 2012, 28 (20) : 2678 - 2679