Next-generation sequencing revolution through big data analytics

被引:24
|
作者
Tripathi, Rashmi [1 ]
Sharma, Pawan [1 ]
Chakraborty, Pavan [2 ]
Varadwaj, Pritish Kumar [2 ]
机构
[1] Indian Inst Informat Technol Allahabad, Dept Bioinformat, Allahabad, Uttar Pradesh, India
[2] Indian Inst Informat Technol Allahabad, Dept Informat Technol, Allahabad, Uttar Pradesh, India
来源
FRONTIERS IN LIFE SCIENCE | 2016年 / 9卷 / 02期
关键词
Big data; cloud computing; Hadoop; next-generation sequencing; genomics; ANALYSIS TOOL; R PACKAGE; FRAMEWORK; HADOOP; CHIP; TRANSCRIPTION; GENOMES; WEB;
D O I
10.1080/21553769.2016.1178180
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Next-generation sequencing (NGS) technology has led to an unrivaled explosion in the amount of genomic data and this escalation has collaterally raised the challenges of sharing, archiving, integrating and analyzing these data. The scale and efficiency of NGS have posed a challenge for analysis of these vast genomic data, gene interactions, annotations and expression studies. However, this limitation of NGS can be safely overcome by tools and algorithms using big data framework. Based on this framework, here we have reviewed the current state of knowledge of big data algorithms for NGS to reveal hidden patterns in sequencing, analysis and annotation, and so on. The APACHE-based Hadoop framework gives an on-interest and adaptable environment for substantial scale data analysis. It has several components for partitioning of large-scale data onto clusters of commodity hardware, in a fault-tolerant manner. Packages like MapReduce, Cloudburst, Crossbow, Myrna, Eoulsan, DistMap, Seal and Contrail perform various NGS applications, such as adapter trimming, quality checking, read mapping, de novo assembly, quantification, expression analysis, variant analysis, and annotation. This review paper deals with the current applications of the Hadoop technology with their usage and limitations in perspective of NGS.
引用
收藏
页码:119 / 149
页数:31
相关论文
共 50 条
  • [21] Next-generation sequencing: adjusting to data overload
    Baker, Monya
    [J]. NATURE METHODS, 2010, 7 (07) : 495 - 499
  • [22] Pathway analysis with next-generation sequencing data
    Jinying Zhao
    Yun Zhu
    Eric Boerwinkle
    Momiao Xiong
    [J]. European Journal of Human Genetics, 2015, 23 : 507 - 515
  • [23] Identification of indels in next-generation sequencing data
    Ratan, Aakrosh
    Olson, Thomas L.
    Loughran, Thomas P., Jr.
    Miller, Webb
    [J]. BMC BIOINFORMATICS, 2015, 16
  • [24] Assembly algorithms for next-generation sequencing data
    Miller, Jason R.
    Koren, Sergey
    Sutton, Granger
    [J]. GENOMICS, 2010, 95 (06) : 315 - 327
  • [25] Applications and data analysis of next-generation sequencing
    Vogl, Ina
    Benet-Pages, Anna
    Eck, Sebastian H.
    Kuhn, Marius
    Vosberg, Sebastian
    Greif, Philipp A.
    Metzeler, Klaus H.
    Biskup, Saskia
    Mueller-Reible, Clemens
    Klein, Hanns-Georg
    [J]. LABORATORIUMSMEDIZIN-JOURNAL OF LABORATORY MEDICINE, 2013, 37 (06): : 305 - 315
  • [26] Identification of indels in next-generation sequencing data
    Aakrosh Ratan
    Thomas L Olson
    Thomas P Loughran
    Webb Miller
    [J]. BMC Bioinformatics, 16
  • [27] Next-generation sequencing and the evolution of data sharing
    de Macena Sobreira, Nara Lygia
    Hamosh, Ada
    [J]. AMERICAN JOURNAL OF MEDICAL GENETICS PART A, 2021, 185 (09) : 2633 - 2635
  • [28] UDAO: A Next-Generation Unified Data Analytics Optimizer
    Zaouk, Khaled
    Song, Fei
    Lyu, Chenghao
    Sinha, Arnab
    Diao, Yanlei
    Shenoy, Prashant
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 1934 - 1937
  • [29] Pathway analysis with next-generation sequencing data
    Zhao, Jinying
    Zhu, Yun
    Boerwinkle, Eric
    Xiong, Momiao
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2015, 23 (04) : 507 - 515
  • [30] Genotyping microsatellites in next-generation sequencing data
    Dashnow, Harriet
    Tan, Susan
    Das, Debjani
    Easteal, Simon
    Oshlack, Alicia
    [J]. BMC BIOINFORMATICS, 2015, 16