DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

被引:47
|
作者
Nagasaki, Hideki [1 ,2 ]
Mochizuki, Takako [1 ,2 ]
Kodama, Yuichi [1 ,2 ]
Saruhashi, Satoshi [1 ,2 ]
Morizaki, Shota [3 ]
Sugawara, Hideaki [1 ,2 ]
Ohyanagi, Hajime [4 ]
Kurata, Nori [4 ]
Okubo, Kousaku [1 ,2 ,5 ]
Takagi, Toshihisa [1 ,2 ,5 ]
Kaminuma, Eli [1 ,2 ]
Nakamura, Yasukazu [1 ,2 ]
机构
[1] Natl Inst Genet, Ctr Informat Biol, Mishima, Shizuoka 4118510, Japan
[2] Natl Inst Genet, DNA Data Bank Japan, Mishima, Shizuoka 4118510, Japan
[3] Fujisoft Inc, Chiyoda Ku, Tokyo 1010022, Japan
[4] Natl Inst Genet, Plant Genet Lab, Mishima, Shizuoka 4118510, Japan
[5] Database Ctr Life Sci, Bunkyo Ku, Tokyo 1130032, Japan
关键词
next-generation sequencing; sequence read archive; cloud computing; analytical pipeline; genome analysis; BURROWS-WHEELER TRANSFORM; RNA-SEQ DATA; GENOME SEQUENCE; ALIGNMENT; ULTRAFAST; ASSEMBLER; VARIANTS; ARCHIVE; BIOLOGY; FORMAT;
D O I
10.1093/dnares/dst017
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
引用
收藏
页码:383 / 390
页数:8
相关论文
共 50 条
  • [41] High-throughput detection of clinically targetable alterations using next-generation sequencing
    Vendrell, Julie A.
    Grand, David
    Rouquette, Isabelle
    Costes, Valarie
    Icher, Samira
    Selves, Janick
    Larrieux, Marion
    Barbe, Aurore
    Brousset, Pierre
    Solassol, Jerome
    ONCOTARGET, 2017, 8 (25) : 40345 - 40358
  • [42] High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics
    Yang, Mary Qu
    Athey, Brian D.
    Arabnia, Hamid R.
    Sung, Andrew H.
    Liu, Qingzhong
    Yang, Jack Y.
    Mao, Jinghe
    Deng, Youping
    BMC GENOMICS, 2009, 10
  • [43] High-throughput next-generation sequencing technologies foster new cutting-edge computing techniques in bioinformatics
    Mary Qu Yang
    Brian D Athey
    Hamid R Arabnia
    Andrew H Sung
    Qingzhong Liu
    Jack Y Yang
    Jinghe Mao
    Youping Deng
    BMC Genomics, 10
  • [44] An ultra-fast computing pipeline for metagenome analysis with next-generation DNA sequencers
    Suzuki, Shuji
    Ishida, Takashi
    Akiyama, Yutaka
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1551 - 1551
  • [45] An ultra-fast computing pipeline for metagenome analysis with next-generation DNA sequencers
    Suzuki, Shuji
    Ishida, Takashi
    Akiyama, Yutaka
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1549 - 1550
  • [46] iSVP: an integrated structural variant calling pipeline from high-throughput sequencing data
    Mimori, Takahiro
    Nariai, Naoki
    Kojima, Kaname
    Takahashi, Mamoru
    Ono, Akira
    Sato, Yukuto
    Yamaguchi-Kabata, Yumi
    Nagasaki, Masao
    BMC SYSTEMS BIOLOGY, 2013, 7
  • [47] Next-generation sequencing: big data meets high performance computing
    Schmidt, Bertil
    Hildebrandt, Andreas
    DRUG DISCOVERY TODAY, 2017, 22 (04) : 712 - 717
  • [48] ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data
    Mann, Ludwig
    Seibt, Kathrin M.
    Weber, Beatrice
    Heitkam, Tony
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [49] Generating in Silico Reference Data Sets for Clinical Next-Generation Sequencing Bioinformatics Pipeline Evaluation
    Li, Ziyang
    Fang, Shuangsang
    Zhang, Rui
    Yu, Lijia
    Zhang, Jiawei
    Bu, Dechao
    Sun, Liang
    Zhao, Yi
    Li, Jinming
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2021, 23 (03): : 285 - 299
  • [50] ECCsplorer: a pipeline to detect extrachromosomal circular DNA (eccDNA) from next-generation sequencing data
    Ludwig Mann
    Kathrin M. Seibt
    Beatrice Weber
    Tony Heitkam
    BMC Bioinformatics, 23