Design of Multiple Sequence Alignment Algorithms on Parallel, Distributed Memory Supercomputers

被引:0
|
作者
Church, Philip C. [1 ]
Goscinski, Andrzej [1 ]
Holt, Kathryn [2 ]
Inouye, Michael [3 ]
Ghoting, Amol [4 ]
Makarychev, Konstantin [4 ]
Reumann, Matthias [5 ]
机构
[1] Deakin Univ, Geelong, Vic 3217, Australia
[2] Univ Melbourne, Dept Microbiol & Immunol, Carlton, Vic, Australia
[3] Univ Melbourne, Walter & Eliza Hall Inst Med Res, Dept Med Biol, Parkville, Vic, Australia
[4] IBM Corp, Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
[5] Univ Melbourne, IBM Res Collab Life Sci, Dept Comp Sci & Software Engn, Carlton, Vic, Australia
关键词
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E. coli, Shigella and S. pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
引用
收藏
页码:924 / 927
页数:4
相关论文
共 50 条
  • [1] Multiple sequence alignment using parallel genetic algorithms
    Anbarasu, LA
    Narayanasamy, P
    Sundararajan, V
    [J]. SIMULATED EVOLUTION AND LEARNING, 1999, 1585 : 130 - 137
  • [2] DESIGN OF PARALLEL ALGORITHMS FOR A DISTRIBUTED MEMORY HYPERCUBE
    ZAPATA, EL
    PLATA, OG
    RIVERA, FF
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1992, 16 (09) : 463 - 470
  • [3] PARALLEL CHARACTERISTICS OF SEQUENCE ALIGNMENT ALGORITHMS
    IYENGAR, AK
    [J]. PROCEEDINGS : SUPERCOMPUTING 89, 1989, : 304 - 313
  • [4] Randomized and parallel algorithms for distance matrix calculations in multiple sequence alignment
    Rajasekaran S.
    Thapar V.
    Dave H.
    Huang C.-H.
    [J]. Journal of Clinical Monitoring and Computing, 2005, 19 (4-5) : 351 - 359
  • [5] Atmospheric data assimilation on distributed-memory parallel supercomputers
    Ding, CHQ
    Lyster, PM
    Larson, JW
    Guo, J
    da Silva, A
    [J]. HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1998, 1401 : 115 - 124
  • [6] GENOME ASSEMBLY FRAMEWORK ON MASSIVELY PARALLEL, DISTRIBUTED MEMORY SUPERCOMPUTERS
    Menhorn, Friedrich
    Reumann, Matthias
    [J]. BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2013, 58
  • [7] GENOME ASSEMBLY FRAMEWORK ON MASSIVELY PARALLEL, DISTRIBUTED MEMORY SUPERCOMPUTERS
    Menhorn, Friedrich
    Reumann, Matthias
    [J]. BIOMEDICAL ENGINEERING-BIOMEDIZINISCHE TECHNIK, 2013, 58
  • [8] A distributed approach for a multiple sequence alignment algorithm using a parallel virtual machine
    Lopes, Heitor S.
    Moritz, Guilherme L.
    [J]. 2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 2843 - 2846
  • [9] Multiple sequence alignment: Algorithms and applications
    Gotoh, O
    [J]. ADVANCES IN BIOPHYSICS, VOL 36, 1999, 1999, 36 : 159 - 206
  • [10] Multiple Sequence Alignment with Genetic Algorithms
    Botta, Marco
    Negro, Guido
    [J]. COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, 2010, 6160 : 206 - 214