Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture

被引:4
|
作者
Garcia, Elkin [1 ]
Arteaga, Jaime [1 ]
Pavel, Robert [1 ]
Gao, Guang R. [1 ]
机构
[1] Univ Delaware, Dept Elect & Comp Engn, CAPSL, Newark, DE 19716 USA
关键词
OPTIMIZATION; MODEL;
D O I
10.1007/978-3-319-09967-5_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Power consumption and energy efficiency have become a major bottleneck in the design of new systems for high performance computing. The path to exa-scale computing requires new strategies that decrease the energy consumption of modern many-core architectures without sacrificing scalability or performance. The development of these strategies demands the use of scalable models for energy consumption and the reorientation of optimization techniques to focus on energy efficiency, evaluating their trade-offs with respect to performance. In this paper, we investigate several optimization techniques to reduce the energy consumption on many-core architectures with a software-managed memory hierarchy. We study the impact of these techniques on the Static Energy and the Dynamic Energy of the LU factorization benchmark using a scalable energy consumption model. The main contributions of this paper are: (1) The modeling and analysis of energy consumption and energy efficiency for LU factorization; (2) the study and design of instruction-level and task-level optimizations for the reduction of the Static and Dynamic Energy; (3) the design and implementation of an energy aware tiling that decreases the Dynamic Energy of power hungry instructions in the LU factorization benchmark; and (4) the experimental evaluation of the scalability and improvement in terms of energy consumption and power efficiency of the proposed optimizations using the IBM Cyclops-64 many-core architecture. We study the trade-offs between performance and power efficiency for the proposed optimizations. Our results for the LU factorization benchmark, using 156 hardware thread units, show an improvement in power efficiency between 1.68X and 4.87X for different matrix sizes. In addition, we point out examples of optimizations that scale in performance but not necessarily in power efficiency.
引用
收藏
页码:237 / 251
页数:15
相关论文
共 50 条
  • [1] H-LU factorization on many-core systems
    Kriemann, Ronald
    COMPUTING AND VISUALIZATION IN SCIENCE, 2013, 16 (03) : 105 - 117
  • [2] Mapping the LU Decomposition on a Many-Core Architecture: Challenges and Solutions
    Venetis, Ioannis E.
    Gao, Guang R.
    CF'09: CONFERENCE ON COMPUTING FRONTIERS & WORKSHOPS, 2009, : 71 - 80
  • [3] Scheduling for Better Energy Efficiency on Many-Core Chips
    Kang, Chanseok
    Lee, Seungyul
    Lee, Yong-Jun
    Lee, Jaejin
    Egger, Bernhard
    JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, JSSPP 2016, 2017, 10353 : 46 - 68
  • [4] An on-node scalable sparse incomplete LU factorization for a many-core iterative solver with Javelin
    Booth, Joshua Dennis
    Bolet, Gregory
    PARALLEL COMPUTING, 2020, 94-95 (94-95)
  • [5] Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture
    Diaz, David
    Jose Esteban, Francisco
    Hernandez, Pilar
    Antonio Caballero, Juan
    Dorado, Gabriel
    Galvez, Sergio
    PARALLEL COMPUTING, 2011, 37 (4-5) : 244 - 259
  • [6] Computer architecture in the many-core era
    Dally, Bill
    PROCEEDINGS 2006 INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2007, : 1 - 1
  • [7] Defragmentation of Tasks in Many-Core Architecture
    Pathania, Anuj
    Venkataramani, Vanchinathan
    Shafique, Muhammad
    Mitra, Tulika
    Henkel, Joerg
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (01)
  • [8] TOWARDS A MANY-CORE ARCHITECTURE FOR HPC
    Wyngaard, Janet
    Inggs, Michael
    Collins, John
    Farrimond, Brian
    2013 23RD INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2013) PROCEEDINGS, 2013,
  • [9] Value and Energy Optimizing Dynamic Resource Allocation in Many-core HPC Systems
    Singh, Amit Kumar
    Dziurzanski, Piotr
    Indrusiak, Leandro Soares
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2015, : 180 - 185
  • [10] Characterizing and optimizing Java-based HPC applications on Intel many-core architecture
    Yang YU
    Tianyang LEI
    Haibo CHEN
    Binyu ZANG
    Science China(Information Sciences), 2017, 60 (12) : 207 - 223