Evaluation of EMVA using the instruction-level parallelism on Tegra X1

被引:0
|
作者
Tominaga, Hirobumi [1 ]
Nakamura, Asuka [2 ]
Maekawa, Yoshitaka [1 ]
机构
[1] Chiba Inst Technol, Dept Comp Sci, Narashino, Chiba, Japan
[2] Chiba Inst Technol, Narashino, Chiba, Japan
关键词
Instruction-level; Random-sparse equations; Unified-Memory; Tegra X1;
D O I
10.1109/CANDARW.2018.00052
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Generally, solving random-sparse equations requires a direct method such as the LU decomposition. This paper proposes a speed-up method based on the extended vectorized LU factorization (EMVA) method for solving random-sparse equations using the instruction-level parallelism of the CUDA GPU. It is known that EMVA on CUDA achieves high execution efficiency [1]. However, the overhead of calling the kernel of EMVA is not small because the EMVA method needs to call a new kernel each time the instruction level increases. This overhead becomes smaller when using an architecture that can switch smoothly between the CPU and GPU kernels, such as the Tegra X1 architectures. Therefore, the proposed method selects the execution architecture of each instruction level from CPU to GPU on the basis of the parallelism of its instruction level. Our evaluation result demonstrates that the proposed method achieves about x26.5 maximum speedup compared to the existing EMVA method.
引用
收藏
页码:239 / 242
页数:4
相关论文
共 50 条
  • [1] Modeling instruction-level parallelism for WCET evaluation
    Barre, Jonathan
    Landet, Cedric
    Rochange, Christine
    Sainrat, Pascal
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2006, : 61 - +
  • [2] Scalable instruction-level parallelism
    Jesshope, C
    [J]. COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, 2004, 3133 : 383 - 392
  • [3] Compilers for instruction-level parallelism
    Schlansker, M
    Conte, TM
    Dehnert, J
    Ebcioglu, K
    Fang, JZ
    Thompson, CL
    [J]. COMPUTER, 1997, 30 (12) : 63 - &
  • [4] LIMITS OF INSTRUCTION-LEVEL PARALLELISM
    WALL, DW
    [J]. SIGPLAN NOTICES, 1991, 26 (04): : 176 - 188
  • [5] Increasing instruction-level parallelism with instruction precomputation
    Yi, JJ
    Sendag, R
    Lilja, DJ
    [J]. EURO-PAR 2002 PARALLEL PROCESSING, PROCEEDINGS, 2002, 2400 : 481 - 485
  • [6] Limits of Instruction-Level Parallelism Capture
    Goossens, Bernard
    Parello, David
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 1664 - 1673
  • [7] Instruction-level parallelism and processor architecture
    Ebcioglu, K
    [J]. EURO-PAR 2000 PARALLEL PROCESSING, PROCEEDINGS, 2000, 1900 : 939 - 939
  • [8] A combinatorial architecture for instruction-level parallelism
    Berkovich, E
    Berkovich, S
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1998, 22 (01) : 23 - 31
  • [9] Using Data-Level Parallelism to Accelerate Instruction-Level Redundancy
    Hu, Yu
    Chen, Zhongliang
    Li, Xiaowei
    [J]. 2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [10] MODELING INSTRUCTION-LEVEL PARALLELISM FOR SOFTWARE PIPELINING
    ADLTABATABAI, AR
    GROSS, T
    LUEH, GY
    REINDERS, J
    [J]. IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1993, 23 : 321 - 330