PEAR: a fast and accurate Illumina Paired-End reAd mergeR

被引:2928
|
作者
Zhang, Jiajie [1 ,2 ,3 ]
Kobert, Kassian [1 ]
Flouri, Tomas [1 ]
Stamatakis, Alexandros [1 ,4 ]
机构
[1] Heidelberg Inst Theoret Studies, Sci Comp Grp, Exelixis Lab, D-69118 Heidelberg, Germany
[2] Med Univ Lubeck, Grad Sch Comp Med & Life Sci, D-23538 Lubeck, Germany
[3] Med Univ Lubeck, Inst Neuro & Bioinformat, D-23538 Lubeck, Germany
[4] Karlsruhe Inst Technol, Inst Theoret Informat, D-76128 Karlsruhe, Germany
关键词
SEQUENCES; GENERATION; ALIGNMENT;
D O I
10.1093/bioinformatics/btt593
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The Illumina paired-end sequencing technology can generate reads from both ends of target DNA fragments, which can subsequently be merged to increase the overall read length. There already exist tools for merging these paired-end reads when the target fragments are equally long. However, when fragment lengths vary and, in particular, when either the fragment size is shorter than a single-end read, or longer than twice the size of a single-end read, most state-of-the-art mergers fail to generate reliable results. Therefore, a robust tool is needed to merge paired-end reads that exhibit varying overlap lengths because of varying target fragment lengths. Results: We present the PEAR software for merging raw Illumina paired-end reads from target fragments of varying length. The program evaluates all possible paired-end read overlaps and does not require the target fragment size as input. It also implements a statistical test for minimizing false-positive results. Tests on simulated and empirical data show that PEAR consistently generates highly accurate merged paired-end reads. A highly optimized implementation allows for merging millions of paired-end reads within a few minutes on a standard desktop computer. On multi-core architectures, the parallel version of PEAR shows linear speedups compared with the sequential version of PEAR.
引用
收藏
页码:614 / 620
页数:7
相关论文
共 50 条
  • [1] PANDAseq: paired-end assembler for illumina sequences
    Andre P Masella
    Andrea K Bartram
    Jakub M Truszkowski
    Daniel G Brown
    Josh D Neufeld
    [J]. BMC Bioinformatics, 13
  • [2] PANDAseq: PAired-eND Assembler for Illumina sequences
    Masella, Andre P.
    Bartram, Andrea K.
    Truszkowski, Jakub M.
    Brown, Daniel G.
    Neufeld, Josh D.
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [3] Paired-end sequencing of Fosmid libraries by Illumina
    Williams, Louise J. S.
    Tabbaa, Diana G.
    Li, Na
    Berlin, Aaron M.
    Shea, Terrance P.
    MacCallum, Iain
    Lawrence, Michael S.
    Drier, Yotam
    Getz, Gad
    Young, Sarah K.
    Jaffe, David B.
    Nusbaum, Chad
    Gnirke, Andreas
    [J]. GENOME RESEARCH, 2012, 22 (11) : 2241 - 2249
  • [4] IMperm: a fast and comprehensive IMmune Paired-End Reads Merger for sequencing data
    Zhang, Wei
    Ju, Jia
    Zhou, Yong
    Xiong, Teng
    Wang, Mengyao
    Li, Chaohui
    Lu, Shixin
    Lu, Zefeng
    Lin, Liya
    Liu, Xiao
    Li, Shuai Cheng
    [J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (02)
  • [5] Joining Illumina paired-end reads for classifying phylogenetic marker sequences
    Liu, Tsunglin
    Chen, Chen-Yu
    Chen-Deng, An
    Chen, Yi-Lin
    Wang, Jiu-Yao
    Hou, Yung-, I
    Lin, Min-Ching
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [6] Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data
    Zhang, Jianwei
    Chen, Ling-Ling
    Sun, Shuai
    Kudrna, Dave
    Copetti, Dario
    Li, Weiming
    Mu, Ting
    Jiao, Wen-Biao
    Xing, Feng
    Lee, Seunghee
    Talag, Jayson
    Song, Jia-Ming
    Du, Bogu
    Xie, Weibo
    Luo, Meizhong
    Maldonado, Carlos Ernesto
    Goicoechea, Jose Luis
    Xiong, Lizhong
    Wu, Changyin
    Xing, Yongzhong
    Zhou, Dao-xiu
    Yu, Sibin
    Zhao, Yu
    Wang, Gongwei
    Yu, Yeisoo
    Luo, Yijie
    Hurtado, Beatriz Elena Padilla
    Danowitz, Ann
    Wing, Rod A.
    Zhang, Qifa
    [J]. SCIENTIFIC DATA, 2016, 3
  • [7] Joining Illumina paired-end reads for classifying phylogenetic marker sequences
    Tsunglin Liu
    Chen-Yu Chen
    An Chen-Deng
    Yi-Lin Chen
    Jiu-Yao Wang
    Yung-I Hou
    Min-Ching Lin
    [J]. BMC Bioinformatics, 21
  • [8] PIPEBAR and OverlapPER: tools for a fast and accurate DNA barcoding analysis and paired-end assembly
    Moreira Oliveira, Renato Renison
    Nunes, Gisele Lopes
    Lopes de Lima, Talvane Glauber
    Oliveira, Guilherme
    Alves, Ronnie
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [9] PIPEBAR and OverlapPER: tools for a fast and accurate DNA barcoding analysis and paired-end assembly
    Renato Renison Moreira Oliveira
    Gisele Lopes Nunes
    Talvâne Glauber Lopes de Lima
    Guilherme Oliveira
    Ronnie Alves
    [J]. BMC Bioinformatics, 19
  • [10] Building two indica rice reference genomes with PacBio long-read and Illumina paired-end sequencing data
    Jianwei Zhang
    Ling-Ling Chen
    Shuai Sun
    Dave Kudrna
    Dario Copetti
    Weiming Li
    Ting Mu
    Wen-Biao Jiao
    Feng Xing
    Seunghee Lee
    Jayson Talag
    Jia-Ming Song
    Bogu Du
    Weibo Xie
    Meizhong Luo
    Carlos Ernesto Maldonado
    Jose Luis Goicoechea
    Lizhong Xiong
    Changyin Wu
    Yongzhong Xing
    Dao-xiu Zhou
    Sibin Yu
    Yu Zhao
    Gongwei Wang
    Yeisoo Yu
    Yijie Luo
    Beatriz Elena Padilla Hurtado
    Ann Danowitz
    Rod A. Wing
    Qifa Zhang
    [J]. Scientific Data, 3