DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

被引:47
|
作者
Nagasaki, Hideki [1 ,2 ]
Mochizuki, Takako [1 ,2 ]
Kodama, Yuichi [1 ,2 ]
Saruhashi, Satoshi [1 ,2 ]
Morizaki, Shota [3 ]
Sugawara, Hideaki [1 ,2 ]
Ohyanagi, Hajime [4 ]
Kurata, Nori [4 ]
Okubo, Kousaku [1 ,2 ,5 ]
Takagi, Toshihisa [1 ,2 ,5 ]
Kaminuma, Eli [1 ,2 ]
Nakamura, Yasukazu [1 ,2 ]
机构
[1] Natl Inst Genet, Ctr Informat Biol, Mishima, Shizuoka 4118510, Japan
[2] Natl Inst Genet, DNA Data Bank Japan, Mishima, Shizuoka 4118510, Japan
[3] Fujisoft Inc, Chiyoda Ku, Tokyo 1010022, Japan
[4] Natl Inst Genet, Plant Genet Lab, Mishima, Shizuoka 4118510, Japan
[5] Database Ctr Life Sci, Bunkyo Ku, Tokyo 1130032, Japan
关键词
next-generation sequencing; sequence read archive; cloud computing; analytical pipeline; genome analysis; BURROWS-WHEELER TRANSFORM; RNA-SEQ DATA; GENOME SEQUENCE; ALIGNMENT; ULTRAFAST; ASSEMBLER; VARIANTS; ARCHIVE; BIOLOGY; FORMAT;
D O I
10.1093/dnares/dst017
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
引用
收藏
页码:383 / 390
页数:8
相关论文
共 50 条
  • [1] A next-gen pipeline for generation, error correction and annotation of high-throughput immunosequencing data
    Emerson, Ryan
    Sherwood, Anna
    DeWitt, William
    Howie, Bryan
    Rieder, Mark
    Robins, Harlan
    JOURNAL OF IMMUNOLOGY, 2014, 192
  • [2] An integrated pipeline for next-generation sequencing and annotation of mitochondrial genomes
    Jex, Aaron R.
    Hall, Ross S.
    Littlewood, D. Timothy J.
    Gasser, Robin B.
    NUCLEIC ACIDS RESEARCH, 2010, 38 (02) : 522 - 533
  • [3] Next-generation sequencing data analysis on cloud computing
    Kwon, Taesoo
    Yoo, Won Gi
    Lee, Won-Ja
    Kim, Won
    Kim, Dae-Won
    GENES & GENOMICS, 2015, 37 (06) : 489 - 501
  • [4] Next-generation sequencing data analysis on cloud computing
    Taesoo Kwon
    Won Gi Yoo
    Won-Ja Lee
    Won Kim
    Dae-Won Kim
    Genes & Genomics, 2015, 37 : 489 - 501
  • [5] HaTSPiL: A modular pipeline for high-throughput sequencing data analysis
    Morandi, Edoardo
    Cereda, Matteo
    Incarnato, Danny
    Parlato, Caterina
    Basile, Giulia
    Anselmi, Francesca
    Lauria, Andrea
    Simon, Lisa Marie
    Polignano, Isabelle Laurence
    Arruga, Francesca
    Deaglio, Silvia
    Tirtei, Elisa
    Fagioli, Franca
    Oliviero, Salvatore
    PLOS ONE, 2019, 14 (10):
  • [6] High-Throughput Next-Generation Sequencing of Polioviruses
    Montmayeur, Anna M.
    Ng, Terry Fei Fan
    Schmidt, Alexander
    Zhao, Kun
    Magana, Laura
    Iber, Jane
    Castro, Christina J.
    Chen, Qi
    Henderson, Elizabeth
    Ramos, Edward
    Shaw, Jing
    Tatusov, Roman L.
    Dybdahl-Sissoko, Naomi
    Endegue-Zanga, Marie Claire
    Adeniji, Johnson A.
    Oberste, M. Steven
    Burns, Cara C.
    JOURNAL OF CLINICAL MICROBIOLOGY, 2017, 55 (02) : 606 - 615
  • [7] High-Throughput Microdissection for Next-Generation Sequencing
    Rosenberg, Avi Z.
    Armani, Michael D.
    Fetsch, Patricia A.
    Xi, Liqiang
    Tina Thu Pham
    Raffeld, Mark
    Chen, Yun
    O'Flaherty, Neil
    Stussman, Rebecca
    Blackler, Adele R.
    Du, Qiang
    Hanson, Jeffrey C.
    Roth, Mark J.
    Filie, Armando C.
    Roh, Michael H.
    Emmert-Buck, Michael R.
    Hipp, Jason D.
    Tangrea, Michael A.
    PLOS ONE, 2016, 11 (03):
  • [8] A Highly Parallel Next-Generation DNA Sequencing Data Analysis Pipeline in Hadoop
    Aggour, Kareem S.
    Kumar, Vijay S.
    Sangurdekar, Dipen P.
    Newberg, Lee A.
    Kodira, Chinnappa D.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 756 - 763
  • [9] SangeR: the high-throughput Sanger sequencing analysis pipeline
    Schmid, Kai
    Dohmen, Hildegard
    Ritschel, Nadja
    Selignow, Carmen
    Zohner, Jochen
    Sehring, Jannik
    Acker, Till
    Amsel, Daniel
    Stamatakis, Alexandros
    BIOINFORMATICS ADVANCES, 2022, 2 (01):
  • [10] Next-Generation High-Throughput Functional Annotation of Microbial Genomes
    Baric, Ralph S.
    Crosson, Sean
    Damania, Blossom
    Miller, Samuel I.
    Rubin, Eric J.
    MBIO, 2016, 7 (05):