The Ensembl computing architecture

被引:12
|
作者
Cuff, JA
Coates, GMP
Cutts, TJR
Rae, M
机构
[1] Wellcome Trust Sanger Inst, Cambridge CB10 1SA, England
[2] Broad Inst, Cambridge, MA 02141 USA
关键词
D O I
10.1101/gr.1866304
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.
引用
收藏
页码:971 / 975
页数:5
相关论文
共 50 条
  • [21] Ensembl 2007
    Hubbard, T. J. P.
    Aken, B. L.
    Beal, K.
    Ballester, B.
    Caccamo, M.
    Chen, Y.
    Clarke, L.
    Coates, G.
    Cunningham, F.
    Cutts, T.
    Down, T.
    Dyer, S. C.
    Fitzgerald, S.
    Fernandez-Banet, J.
    Graf, S.
    Haider, S.
    Hammond, M.
    Herrero, J.
    Holland, R.
    Howe, K.
    Howe, K.
    Johnson, N.
    Kahari, A.
    Keefe, D.
    Kokocinski, F.
    Kulesha, E.
    Lawson, D.
    Longden, I.
    Melsopp, C.
    Megy, K.
    Meidl, P.
    Overduin, B.
    Parker, A.
    Prlic, A.
    Rice, S.
    Rios, D.
    Schuster, M.
    Sealy, I.
    Severin, J.
    Slater, G.
    Smedley, D.
    Spudich, G.
    Trevanion, S.
    Vilella, A.
    Vogel, J.
    White, S.
    Wood, M.
    Cox, T.
    Curwen, V.
    Durbin, R.
    NUCLEIC ACIDS RESEARCH, 2007, 35 : D610 - D617
  • [22] Ensembl 2005
    Hubbard, T
    Andrews, D
    Caccamo, M
    Cameron, G
    Chen, Y
    Clamp, M
    Clarke, L
    Coates, G
    Cox, T
    Cunningham, F
    Curwen, V
    Cutts, T
    Down, T
    Durbin, R
    Fernandez-Suarez, XM
    Gilbert, J
    Hammond, M
    Herrero, J
    Hotz, H
    Howe, K
    Iyer, V
    Jekosch, K
    Kahari, A
    Kasprzyk, A
    Keefe, D
    Keenan, S
    Kokocinsci, F
    London, D
    Longden, I
    McVicker, G
    Melsopp, C
    Meidl, P
    Potter, S
    Proctor, G
    Rae, M
    Rios, D
    Schuster, M
    Searle, S
    Severin, J
    Slater, G
    Smedley, D
    Smith, J
    Spooner, W
    Stabenau, A
    Stalker, J
    Storey, R
    Trevanion, S
    Ureta-Vidal, A
    Vogel, J
    White, S
    NUCLEIC ACIDS RESEARCH, 2005, 33 : D447 - D453
  • [23] Ensembl 2013
    Flicek, Paul
    Ahmed, Ikhlak
    Amode, M. Ridwan
    Barrell, Daniel
    Beal, Kathryn
    Brent, Simon
    Carvalho-Silva, Denise
    Clapham, Peter
    Coates, Guy
    Fairley, Susan
    Fitzgerald, Stephen
    Gil, Laurent
    Garcia-Giron, Carlos
    Gordon, Leo
    Hourlier, Thibaut
    Hunt, Sarah
    Juettemann, Thomas
    Kaehaeri, Andreas K.
    Keenan, Stephen
    Komorowska, Monika
    Kulesha, Eugene
    Longden, Ian
    Maurel, Thomas
    McLaren, William M.
    Muffato, Matthieu
    Nag, Rishi
    Overduin, Bert
    Pignatelli, Miguel
    Pritchard, Bethan
    Pritchard, Emily
    Riat, Harpreet Singh
    Ritchie, Graham R. S.
    Ruffier, Magali
    Schuster, Michael
    Sheppard, Daniel
    Sobral, Daniel
    Taylor, Kieron
    Thormann, Anja
    Trevanion, Stephen
    White, Simon
    Wilder, Steven P.
    Aken, Bronwen L.
    Birney, Ewan
    Cunningham, Fiona
    Dunham, Ian
    Harrow, Jennifer
    Herrero, Javier
    Hubbard, Tim J. P.
    Johnson, Nathan
    Kinsella, Rhoda
    NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D48 - D55
  • [24] Ensembl 2021
    Howe, Kevin L.
    Achuthan, Premanand
    Allen, James
    Allen, Jamie
    Alvarez-Jarreta, Jorge
    Amode, M. Ridwan
    Armean, Irina M.
    Azov, Andrey G.
    Bennett, Ruth
    Bhai, Jyothish
    Billis, Konstantinos
    Boddu, Sanjay
    Charkhchi, Mehrnaz
    Cummins, Carla
    Fioretto, Luca Rin
    Davidson, Claire
    Dodiya, Kamalkumar
    El Houdaigui, Bilal
    Fatima, Reham
    Gall, Astrid
    Giron, Carlos Garcia
    Grego, Tiago
    Guijarro-Clarke, Cristina
    Haggerty, Leanne
    Hemrom, Anmol
    Hourlier, Thibaut
    Izuogu, Osagie G.
    Juettemann, Thomas
    Kaikala, Vinay
    Kay, Mike
    Lavidas, Ilias
    Le, Tuan
    Lemos, Diana
    Martinez, Jose Gonzalez
    Marugan, Jose Carlos
    Maurel, Thomas
    McMahon, Aoife C.
    Mohanan, Shamika
    Moore, Benjamin
    Muffato, Matthieu
    Oheh, Denye N.
    Paraschas, Dimitrios
    Parker, Anne
    Parton, Andrew
    Prosovetskaia, Irina
    Sakthivel, Manoj P.
    Salam, Ahamed I. Abdul
    Schmitt, Bianca M.
    Schuilenburg, Helen
    Sheppard, N.
    NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D884 - D891
  • [25] Ensembl variation resources
    Chen, Yuan
    Cunningham, Fiona
    Rios, Daniel
    McLaren, William M.
    Smith, James
    Pritchard, Bethan
    Spudich, Giulietta M.
    Brent, Simon
    Kulesha, Eugene
    Marin-Garcia, Pablo
    Smedley, Damian
    Birney, Ewan
    Flicek, Paul
    BMC GENOMICS, 2010, 11
  • [26] Ensembl: A genome infrastructure
    Birney, E
    COLD SPRING HARBOR SYMPOSIA ON QUANTITATIVE BIOLOGY, 2003, 68 : 213 - 215
  • [27] Ensembl regulation resources
    Zerbino, Daniel R.
    Johnson, Nathan
    Juetteman, Thomas
    Sheppard, Dan
    Wilder, Steven P.
    Lavidas, Ilias
    Nuhn, Michael
    Perry, Emily
    Raffaillac-Desfosses, Quentin
    Sobral, Daniel
    Keefe, Damian
    Graef, Stefan
    Ahmed, Ikhlak
    Kinsella, Rhoda
    Pritchard, Bethan
    Brent, Simon
    Amode, Ridwan
    Parker, Anne
    Trevanion, Steven
    Birney, Ewan
    Dunham, Ian
    Flicek, Paul
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [28] Ensembl variation resources
    Hunt, Sarah E.
    McLaren, William
    Gil, Laurent
    Thormann, Anja
    Schuilenburg, Helen
    Sheppard, Dan
    Parton, Andrew
    Armean, Irina M.
    Trevanion, Stephen J.
    Flicek, Paul
    Cunningham, Fiona
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,
  • [29] The ensembl analysis pipeline
    Potter, SC
    Clarke, L
    Curwen, V
    Keenan, S
    Mongin, E
    Searle, SMJ
    Stabenau, A
    Storey, R
    Clamp, M
    GENOME RESEARCH, 2004, 14 (05) : 934 - 941
  • [30] Architecture, design and computing
    Cardoso Llach, Daniel
    Capdevila Werning, Remei
    DEARQ-REVISTA DE ARQUITECTURA-JOURNAL OF ARCHITECTURE, 2009, (04): : 136 - 140