Next-generation sequencing revolution through big data analytics

被引:24
|
作者
Tripathi, Rashmi [1 ]
Sharma, Pawan [1 ]
Chakraborty, Pavan [2 ]
Varadwaj, Pritish Kumar [2 ]
机构
[1] Indian Inst Informat Technol Allahabad, Dept Bioinformat, Allahabad, Uttar Pradesh, India
[2] Indian Inst Informat Technol Allahabad, Dept Informat Technol, Allahabad, Uttar Pradesh, India
来源
FRONTIERS IN LIFE SCIENCE | 2016年 / 9卷 / 02期
关键词
Big data; cloud computing; Hadoop; next-generation sequencing; genomics; ANALYSIS TOOL; R PACKAGE; FRAMEWORK; HADOOP; CHIP; TRANSCRIPTION; GENOMES; WEB;
D O I
10.1080/21553769.2016.1178180
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation sequencing (NGS) technology has led to an unrivaled explosion in the amount of genomic data and this escalation has collaterally raised the challenges of sharing, archiving, integrating and analyzing these data. The scale and efficiency of NGS have posed a challenge for analysis of these vast genomic data, gene interactions, annotations and expression studies. However, this limitation of NGS can be safely overcome by tools and algorithms using big data framework. Based on this framework, here we have reviewed the current state of knowledge of big data algorithms for NGS to reveal hidden patterns in sequencing, analysis and annotation, and so on. The APACHE-based Hadoop framework gives an on-interest and adaptable environment for substantial scale data analysis. It has several components for partitioning of large-scale data onto clusters of commodity hardware, in a fault-tolerant manner. Packages like MapReduce, Cloudburst, Crossbow, Myrna, Eoulsan, DistMap, Seal and Contrail perform various NGS applications, such as adapter trimming, quality checking, read mapping, de novo assembly, quantification, expression analysis, variant analysis, and annotation. This review paper deals with the current applications of the Hadoop technology with their usage and limitations in perspective of NGS.
引用
收藏
页码:119 / 149
页数:31
相关论文
共 50 条
  • [1] Visual programming for next-generation sequencing data analytics
    Franco Milicchio
    Rebecca Rose
    Jiang Bian
    Jae Min
    Mattia Prosperi
    [J]. BioData Mining, 9
  • [2] Visual programming for next-generation sequencing data analytics
    Milicchio, Franco
    Rose, Rebecca
    Bian, Jiang
    Min, Jae
    Prosperi, Mattia
    [J]. BIODATA MINING, 2016, 9
  • [3] Advancing next-generation sequencing data analytics with scalable distributed infrastructure
    Kim, Joohyun
    Maddineni, Sharath
    Jha, Shantenu
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2014, 26 (04): : 894 - 906
  • [4] Sequential sequencing by synthesis and the next-generation sequencing revolution
    Uhlen, Mathias
    Quake, Stephen R.
    [J]. TRENDS IN BIOTECHNOLOGY, 2023, 41 (12) : 1565 - 1572
  • [5] Next-Generation Analytics for Omics Data
    Li, Jun
    Chen, Hu
    Wang, Yumeng
    Chen, Mei-Ju May
    Liang, Han
    [J]. CANCER CELL, 2021, 39 (01) : 3 - 6
  • [6] NGSNGS: next-generation simulator for next-generation sequencing data
    Henriksen, Rasmus Amund
    Zhao, Lei
    Korneliussen, Thorfinn Sand
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [7] Next-generation sequencing: big data meets high performance computing
    Schmidt, Bertil
    Hildebrandt, Andreas
    [J]. DRUG DISCOVERY TODAY, 2017, 22 (04) : 712 - 717
  • [8] Indexing Next-Generation Sequencing data
    Jalili, Vahid
    Matteucci, Matteo
    Masseroli, Marco
    Ceri, Stefano
    [J]. INFORMATION SCIENCES, 2017, 384 : 90 - 109
  • [9] Big data from small samples: Informatics of next-generation sequencing in cytopathology
    Roy-Chowdhuri, Sinchita
    Roy, Somak
    Monaco, Sara E.
    Routbort, Mark J.
    Pantanowitz, Liron
    [J]. CANCER CYTOPATHOLOGY, 2017, 125 (04) : 236 - 244
  • [10] Next-Generation Big Data Analytics: State of the Art, Challenges, and Future Research Topics
    Lv, Zhihan
    Song, Houbing
    Basanta-Val, Pablo
    Steed, Anthony
    Jo, Minho
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2017, 13 (04) : 1891 - 1899