Large-scale linear regression: Development of high-performance routines

被引:10
|
作者
Frank, Alvaro [1 ]
Fabregat-Traver, Diego [1 ]
Bientinesi, Paolo [1 ]
机构
[1] Rhein Westfal TH Aachen, AICES, D-52062 Aachen, Germany
关键词
Linear regression; Ordinary least squares; Algorithm design; Out-of-core; Parallelism; Scalability; GENOME-WIDE ASSOCIATION; MACULAR DEGENERATION; LOCI;
D O I
10.1016/j.amc.2015.11.078
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In statistics, series of ordinary least squares problems (OLS) are used to study the linear correlation among sets of variables of interest; in many studies, the number of such variables is at least in the millions, and the corresponding datasets occupy terabytes of disk space. As the availability of large-scale datasets increases regularly, so does the challenge in dealing with them Indeed, traditional solvers-which rely on the use of "black-box" routines optimized for one single OLS-are highly inefficient and fail to provide a viable solution for big-data analyses. As a case study, in this paper we consider a linear regression consisting of two-dimensional grids of related OLS problems that arise in the context of genome-wide association analyses, and give a careful walkthrough for the development of OLS-GRID, a high-performance routine for shared-memory architectures; analogous steps are relevant for tailoring OLS solvers to other applications. In particular, we first illustrate the design of efficient algorithms that exploit the structure of the OLS problems and eliminate redundant computations; then, we show how to effectively deal with datasets that do not fit in main memory; finally, we discuss how to cast the computation in terms of efficient kernels and how to achieve scalability. Importantly, each design decision along the way is justified by simple performance models. OLS-GRID enables the solution of 10(11) correlated OLS problems operating on terabytes of data in a matter of hours. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:411 / 421
页数:11
相关论文
共 50 条
  • [31] Towards Portable Large-Scale Image Processing with High-Performance Computing
    Yuankai Huo
    Justin Blaber
    Stephen M. Damon
    Brian D. Boyd
    Shunxing Bao
    Prasanna Parvathaneni
    Camilo Bermudez Noguera
    Shikha Chaganti
    Vishwesh Nath
    Jasmine M. Greer
    Ilwoo Lyu
    William R. French
    Allen T. Newton
    Baxter P. Rogers
    Bennett A. Landman
    Journal of Digital Imaging, 2018, 31 : 304 - 314
  • [32] HIGH-PERFORMANCE VERY LARGE-SCALE INTEGRATED PHOTOMASK WITH A SILICIDE FILM
    WATAKABE, Y
    MATSUDA, S
    SHIGETOMI, A
    HIROSUE, M
    KATO, T
    NAKATA, H
    JOURNAL OF VACUUM SCIENCE & TECHNOLOGY B, 1986, 4 (04): : 841 - 844
  • [33] Development of a high-performance domain-wise parallel direct solver for large-scale structural analysis
    Kim, JH
    Lee, CS
    Kim, SJ
    SEVENTH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND GRID IN ASIA PACIFIC REGION, PROCEEDINGS, 2004, : 267 - 274
  • [34] Development of High-Performance Large-Scale Structural Supercapacitors via the Resin Infusion Process and Encapsulation Process
    Lee, Yi-Ruei
    Wu, Kai-Jen
    Young, Wen-Bin
    Young, Christine
    ACS APPLIED ENERGY MATERIALS, 2024, 7 (18): : 8066 - 8076
  • [35] Large-Scale Linear Support Vector Ordinal Regression Solver
    Shi, Yong
    Wang, Huadong
    Niu, Lingfeng
    2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 1177 - 1183
  • [36] The large-scale integration of high-performance silicon nanowire field effect transistors
    Li, Qiliang
    Zhu, Xiaoxiao
    Yang, Yang
    Ioannou, Dimitris E.
    Xiong, Hao D.
    Kwon, Doo-Won
    Suehle, John S.
    Richter, Curt A.
    NANOTECHNOLOGY, 2009, 20 (41)
  • [37] High-Performance General Solver for Extremely Large-Scale Semidefinite Programming Problems
    Fujisawa, Katsuki
    Endo, Toshio
    Sato, Hitoshi
    Yamashita, Makoto
    Matsuoka, Satoshi
    Nakata, Maho
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [38] Cells in Silico - introducing a high-performance framework for large-scale tissue modeling
    Berghoff, Marco
    Rosenbauer, Jakob
    Hoffmann, Felix
    Schug, Alexander
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [39] Large-scale urban traffic simulation with Scala and high-performance computing system
    Janczykowski, Michal
    Turek, Wojciech
    Malawski, Maciej
    Byrski, Aleksander
    JOURNAL OF COMPUTATIONAL SCIENCE, 2019, 35 : 91 - 101
  • [40] Cells in Silico – introducing a high-performance framework for large-scale tissue modeling
    Marco Berghoff
    Jakob Rosenbauer
    Felix Hoffmann
    Alexander Schug
    BMC Bioinformatics, 21