Evaluation of EMVA using the instruction-level parallelism on Tegra X1

被引：0

作者：

Tominaga, Hirobumi ^{[1
]}

Nakamura, Asuka ^{[2
]}

Maekawa, Yoshitaka ^{[1
]}

机构：

[1] Chiba Inst Technol, Dept Comp Sci, Narashino, Chiba, Japan

[2] Chiba Inst Technol, Narashino, Chiba, Japan

来源：

2018 SIXTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2018) | 2018年

关键词：

Instruction-level; Random-sparse equations; Unified-Memory; Tegra X1;

D O I：

10.1109/CANDARW.2018.00052

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Generally, solving random-sparse equations requires a direct method such as the LU decomposition. This paper proposes a speed-up method based on the extended vectorized LU factorization (EMVA) method for solving random-sparse equations using the instruction-level parallelism of the CUDA GPU. It is known that EMVA on CUDA achieves high execution efficiency [1]. However, the overhead of calling the kernel of EMVA is not small because the EMVA method needs to call a new kernel each time the instruction level increases. This overhead becomes smaller when using an architecture that can switch smoothly between the CPU and GPU kernels, such as the Tegra X1 architectures. Therefore, the proposed method selects the execution architecture of each instruction level from CPU to GPU on the basis of the parallelism of its instruction level. Our evaluation result demonstrates that the proposed method achieves about x26.5 maximum speedup compared to the existing EMVA method.

引用

页码：239 / 242

页数：4

共 50 条

[1] Modeling instruction-level parallelism for WCET evaluation
Barre, Jonathan
Landet, Cedric
Rochange, Christine
Sainrat, Pascal
12TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2006, : 61 - +
[2] Scalable instruction-level parallelism
Jesshope, C
COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, 2004, 3133 : 383 - 392
[3] Compilers for instruction-level parallelism
Schlansker, M
Conte, TM
Dehnert, J
Ebcioglu, K
Fang, JZ
Thompson, CL
COMPUTER, 1997, 30 (12) : 63 - &
[4] LIMITS OF INSTRUCTION-LEVEL PARALLELISM
WALL, DW
SIGPLAN NOTICES, 1991, 26 (04): : 176 - 188
[5] Increasing instruction-level parallelism with instruction precomputation
Yi, JJ
Sendag, R
Lilja, DJ
EURO-PAR 2002 PARALLEL PROCESSING, PROCEEDINGS, 2002, 2400 : 481 - 485
[6] Limits of Instruction-Level Parallelism Capture
Goossens, Bernard
Parello, David
2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 1664 - 1673
[7] Workshop 17: Instruction-level parallelism
Arvind, D.K.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1997, 1300 LNCS : 1039 - 1042
[8] A combinatorial architecture for instruction-level parallelism
Berkovich, E
Berkovich, S
MICROPROCESSORS AND MICROSYSTEMS, 1998, 22 (01) : 23 - 31
[9] Instruction-level parallelism and processor architecture
Ebcioglu, K
EURO-PAR 2000 PARALLEL PROCESSING, PROCEEDINGS, 2000, 1900 : 939 - 939
[10] Using Data-Level Parallelism to Accelerate Instruction-Level Redundancy
Hu, Yu
Chen, Zhongliang
Li, Xiaowei
2012 WORLD AUTOMATION CONGRESS (WAC), 2012,

← 1 2 3 4 5 →