EBIC: an open source software for high-dimensional and big data analyses

被引:8
|
作者
Orzechowski, Patryk [1 ,2 ]
Moore, Jason H. [1 ]
机构
[1] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[2] AGH Univ Sci & Technol, Dept Automat & Robot, PL-30059 Krakow, Poland
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btz027
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In this paper, we present an open source package with the latest release of Evolutionary-based BIClustering (EBIC), a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding a full support for multiple graphics processing units (GPUs) support, which makes it possible to run efficiently large genomic data mining analyses. Multiple enhancements to the first release of the algorithm include integration with R and Bioconductor, and an option to exclude missing values from the analysis. Results: Evolutionary-based BIClustering was applied to datasets of different sizes, including a large DNA methylation dataset with 436 444 rows. For the largest dataset we observed over 6.6-fold speedup in computation time on a cluster of eight GPUs compared to running the method on a single GPU. This proves high scalability of the method.
引用
收藏
页码:3181 / 3183
页数:3
相关论文
共 50 条
  • [41] The challenge of complexity in the Big Data era: how to ride the wave of high-dimensional data revolution
    Bossa, Cecilia
    Branchi, Igor
    Caccia, Barbara
    Cisbani, Evaristo
    Daniele, Carla
    D'Avenio, Giuseppe
    Esposito, Giuseppe
    Facchiano, Francesco
    Frustagli, Gianluca
    Gagliardi, Roberta Valentina
    Galluzzi, Andrea
    Giansanti, Daniele
    Gigante, Guido
    Giuliani, Alessandro
    Le Pera, Loredana
    Mattia, Maurizio
    Morelli, Sandra
    Moro, Ornella
    Palma, Alessandra
    Pazienti, Antonio
    Picconi, Orietta
    Pizzi, Elisabetta
    Poli, Cecilia
    Ruspantini, Irene
    Tait, Sabrina
    Tcheremenskaia, Olga
    ANNALI DELL ISTITUTO SUPERIORE DI SANITA, 2022, 58 (03): : 151 - 153
  • [42] On Criticality in High-Dimensional Data
    Saremi, Saeed
    Sejnowski, Terrence J.
    NEURAL COMPUTATION, 2014, 26 (07) : 1329 - 1339
  • [43] High-Dimensional Data Bootstrap
    Chernozhukov, Victor
    Chetverikov, Denis
    Kato, Kengo
    Koike, Yuta
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2023, 10 : 427 - 449
  • [44] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [46] High-dimensional data visualization
    Tang, Lin
    NATURE METHODS, 2020, 17 (02) : 129 - 129
  • [47] High-dimensional data visualization
    Lin Tang
    Nature Methods, 2020, 17 : 129 - 129
  • [48] Haery: A Hadoop Based Query System on Accumulative and High-Dimensional Data Model for Big Data
    Song, Jie
    He, HongYan
    Thomas, Richard
    Bao, Yubin
    Yu, Ge
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (07) : 1362 - 1377
  • [49] High-dimensional Data Cubes
    John, Sachin Basil
    Koch, Christoph
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (13): : 3828 - 3840
  • [50] Modeling High-Dimensional Data
    Vempala, Santosh S.
    COMMUNICATIONS OF THE ACM, 2012, 55 (02) : 112 - 112