Generation and application of pseudo-long reads for metagenome assembly

被引:0
|
作者
Sim, Mikang [1 ]
Lee, Jongin [1 ]
Wy, Suyeon [1 ]
Park, Nayoung [1 ]
Lee, Daehwan [1 ]
Kwon, Daehong [1 ]
kim, Jaebum [1 ]
机构
[1] Konkuk Univ, Dept Biomed Sci & Engn, 120 Neungdong Ro, Seoul 05029, South Korea
来源
GIGASCIENCE | 2022年 / 11卷
关键词
next-generation sequencing; metagenomic assembly; pseudo-long read;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Background Metagenomic assembly using high-throughput sequencing data is a powerful method to construct microbial genomes in environmental samples without cultivation. However, metagenomic assembly, especially when only short reads are available, is a complex and challenging task because mixed genomes of multiple microorganisms constitute the metagenome. Although long read sequencing technologies have been developed and have begun to be used for metagenomic assembly, many metagenomic studies have been performed based on short reads because the generation of long reads requires higher sequencing cost than short reads. Results In this study, we present a new method called PLR-GEN. It creates pseudo-long reads from metagenomic short reads based on given reference genome sequences by considering small sequence variations existing in individual genomes of the same or different species. When applied to a mock community data set in the Human Microbiome Project, PLR-GEN dramatically extended short reads in length of 101 bp to pseudo-long reads with N50 of 33 Kbp and 0.4% error rate. The use of these pseudo-long reads generated by PLR-GEN resulted in an obvious improvement of metagenomic assembly in terms of the number of sequences, assembly contiguity, and prediction of species and genes. Conclusions PLR-GEN can be used to generate artificial long read sequences without spending extra sequencing cost, thus aiding various studies using metagenomes.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Nanopore sequencing and assembly of a human genome with ultra-long reads
    Miten Jain
    Sergey Koren
    Karen H Miga
    Josh Quick
    Arthur C Rand
    Thomas A Sasani
    John R Tyson
    Andrew D Beggs
    Alexander T Dilthey
    Ian T Fiddes
    Sunir Malla
    Hannah Marriott
    Tom Nieto
    Justin O'Grady
    Hugh E Olsen
    Brent S Pedersen
    Arang Rhie
    Hollian Richardson
    Aaron R Quinlan
    Terrance P Snutch
    Louise Tee
    Benedict Paten
    Adam M Phillippy
    Jared T Simpson
    Nicholas J Loman
    Matthew Loose
    Nature Biotechnology, 2018, 36 : 338 - 345
  • [32] Assembly of Long Error-Prone Reads Using Repeat Graphs
    Kolmogorov, Mikhail
    Yuan, Jeffrey
    Lin, Yu
    Pevzner, Pavel
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 261 - 262
  • [33] Improved transcriptome assembly using a hybrid of long and short reads with StringTie
    Shumate, Alaina
    Wong, Brandon
    Pertea, Geo
    Pertea, Mihaela
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (06)
  • [34] Nanopore sequencing and assembly of a human genome with ultra-long reads
    Jain, Miten
    Koren, Sergey
    Miga, Karen H.
    Quick, Josh
    Rand, Arthur C.
    Sasani, Thomas A.
    Tyson, John R.
    Beggs, Andrew D.
    Dilthey, Alexander T.
    Fiddes, Ian T.
    Malla, Sunir
    Marriott, Hannah
    Nieto, Tom
    O'Grady, Justin
    Olsen, Hugh E.
    Pedersen, Brent S.
    Rhie, Arang
    Richardson, Hollian
    Quinlan, Aaron R.
    Snutch, Terrance P.
    Tee, Louise
    Paten, Benedict
    Phillippy, Adam M.
    Simpson, Jared T.
    Loman, Nicholas J.
    Loose, Matthew
    NATURE BIOTECHNOLOGY, 2018, 36 (04) : 338 - +
  • [35] Assembly of long, error-prone reads using repeat graphs
    Kolmogorov, Mikhail
    Yuan, Jeffrey
    Lin, Yu
    Pevzner, Pavel A.
    NATURE BIOTECHNOLOGY, 2019, 37 (05) : 540 - +
  • [36] Assembly of long, error-prone reads using repeat graphs
    Mikhail Kolmogorov
    Jeffrey Yuan
    Yu Lin
    Pavel A. Pevzner
    Nature Biotechnology, 2019, 37 : 540 - 546
  • [37] De novo diploid genome assembly using long noisy reads
    Fan Nie
    Peng Ni
    Neng Huang
    Jun Zhang
    Zhenyu Wang
    Chuanle Xiao
    Feng Luo
    Jianxin Wang
    Nature Communications, 15
  • [38] cloudSPAdes: assembly of synthetic long reads using de Bruijn graphs
    Tolstoganov, Ivan
    Bankevich, Anton
    Chen, Zhoutao
    Pevzner, Pavel A.
    BIOINFORMATICS, 2019, 35 (14) : I61 - I70
  • [39] DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies
    Chengxi Ye
    Christopher M. Hill
    Shigang Wu
    Jue Ruan
    Zhanshan (Sam) Ma
    Scientific Reports, 6
  • [40] DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies
    Ye, Chengxi
    Hill, Christopher M.
    Wu, Shigang
    Ruan, Jue
    Ma, Zhanshan
    SCIENTIFIC REPORTS, 2016, 6