Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture

被引:4
|
作者
Garcia, Elkin [1 ]
Arteaga, Jaime [1 ]
Pavel, Robert [1 ]
Gao, Guang R. [1 ]
机构
[1] Univ Delaware, Dept Elect & Comp Engn, CAPSL, Newark, DE 19716 USA
关键词
OPTIMIZATION; MODEL;
D O I
10.1007/978-3-319-09967-5_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Power consumption and energy efficiency have become a major bottleneck in the design of new systems for high performance computing. The path to exa-scale computing requires new strategies that decrease the energy consumption of modern many-core architectures without sacrificing scalability or performance. The development of these strategies demands the use of scalable models for energy consumption and the reorientation of optimization techniques to focus on energy efficiency, evaluating their trade-offs with respect to performance. In this paper, we investigate several optimization techniques to reduce the energy consumption on many-core architectures with a software-managed memory hierarchy. We study the impact of these techniques on the Static Energy and the Dynamic Energy of the LU factorization benchmark using a scalable energy consumption model. The main contributions of this paper are: (1) The modeling and analysis of energy consumption and energy efficiency for LU factorization; (2) the study and design of instruction-level and task-level optimizations for the reduction of the Static and Dynamic Energy; (3) the design and implementation of an energy aware tiling that decreases the Dynamic Energy of power hungry instructions in the LU factorization benchmark; and (4) the experimental evaluation of the scalability and improvement in terms of energy consumption and power efficiency of the proposed optimizations using the IBM Cyclops-64 many-core architecture. We study the trade-offs between performance and power efficiency for the proposed optimizations. Our results for the LU factorization benchmark, using 156 hardware thread units, show an improvement in power efficiency between 1.68X and 4.87X for different matrix sizes. In addition, we point out examples of optimizations that scale in performance but not necessarily in power efficiency.
引用
收藏
页码:237 / 251
页数:15
相关论文
共 50 条
  • [31] Runtime Energy Management for Many-Core Systems
    Martins, Andre L. M.
    Sant'Ana, Anderson C.
    Moraes, Fernando G.
    23RD IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS CIRCUITS AND SYSTEMS (ICECS 2016), 2016, : 380 - 383
  • [32] Characterizing and Optimizing Transformer Inference on ARM Many-core Processor
    Jiang, Jiazhi
    Du, Jiangsu
    Huang, Dan
    Li, Dongsheng
    Zheng, Jiang
    Lu, Yutong
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [33] Scaling and optimizing the Gysela code on a cluster of many-core processors
    Latu, Guillaume
    Asahi, Yuuichi
    Bigot, Julien
    Feher, Tamas
    Grandgirard, Virginie
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 466 - 473
  • [34] Optimizing the Linear Fascicle Evaluation Algorithm for Many-Core Systems
    Aggarwal, Karan
    Bondhugula, Uday
    INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, : 425 - 437
  • [35] Parallel simulation of many-core processor and many-core clusters
    Lü, Huiwei
    Cheng, Yuan
    Bai, Lu
    Chen, Mingyu
    Fan, Dongrui
    Sun, Ninghui
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (05): : 1110 - 1117
  • [36] Designing and dynamically load balancing hybrid LU for multi/many-core
    Deisher, Michael
    Smelyanskiy, Mikhail
    Nickerson, Brian
    Lee, Victor W.
    Chuvelev, Michael
    Dubey, Pradeep
    COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2011, 26 (3-4): : 211 - 220
  • [37] Hierarchical Energy Monitoring for Many-Core Systems
    Martins, Andre L. M.
    Ruaro, Marcelo
    Moraes, Fernando G.
    2015 IEEE CONFERENCE ON ELECTRONICS, CIRCUITS, AND SYSTEMS (ICECS), 2015, : 657 - 660
  • [38] An Optimized Framework for Matrix Factorization on the New Sunway Many-core Platform
    Ma, Wenjing
    Liu, Fangfang
    Chen, Daokun
    Lu, Qinglin
    Hu, Yi
    Wang, Hongsen
    Yuan, Xinhui
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2023, 20 (02)
  • [39] Towards Many-Core Implementation of LU Decomposition using Peano Curves
    Heinecke, Alexander
    Bader, Michael
    UCHPC-MAW09: UNCONVENTIONAL HIGH PERFORMANCE COMPUTING/MEMORY ACCESS: IS THE MEMORY FIT FOR MANYCORE?, 2009, : 21 - 30
  • [40] Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture
    Chen, Donglin
    Fang, Jianbin
    Chen, Shizhao
    Xu, Chuanfu
    Wang, Zheng
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2019, 47 (03) : 418 - 432