Architecture optimization for multimedia application exploiting data and thread-level parallelism

被引：3

作者：

Limousin, C ^{[1
]}

Sebot, J ^{[1
]}

Vartanian, A ^{[1
]}

Drach, N ^{[1
]}

机构：

[1] Univ Paris 11, LRI, F-91405 Orsay, France

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2005年 / 51卷 / 01期

关键词：

SIMD; SMT; superscalar processor; memory hierarchy; multimedia;

D O I：

10.1016/j.sysarc.2004.06.002

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The characteristics of multimedia applications when executed on general-purpose processors are not well understood. Such knowledge is extremely important in guiding the development of multimedia applications and the design of future processors. In this paper, we characterize and optimize the performance of multimedia applications on superscalar processor exploiting data-level parallelism and thread-level parallelism with SIMD (Single Instruction Multiple Data) and SMT (Simultaneous MultiThreading) capacities. We show that SMT and SIMD superscalar processor is suitable for 3D geometry application and we characterize the execution in term of memory hierarchy, which is the main bottleneck. The results show that the latency is not fully recovered by SMT; the use of second-level data prefetching does not succeed in increasing the performance. With detailed analysis, we show that this problem comes from a pollution of the instruction window by the threads experiencing second-level cache misses, thus reducing the window available for the other threads. We thus propose a hardware mechanism (an architecture optimization) to predict second-level misses and control this pollution. (C) 2004 Elsevier B.V. All rights reserved.

引用

页码：15 / 27

页数：13

共 50 条

[41] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
Yi Yang
Chao Li
Huiyang Zhou
Journal of Computer Science and Technology, 2015, 30 : 3 - 19
[42] Exploitation of Nested Thread-Level Speculative Parallelism on Multi-Core Systems
Kejariwal, Arun
Girkar, Milind
Tian, Xinmin
Saito, Hideki
Nicolau, Alexandru
Veidenbaum, Alexander V.
Banerjee, Utpal
Polychronopoulos, Constantine D.
PROCEEDINGS OF THE 2010 COMPUTING FRONTIERS CONFERENCE (CF 2010), 2010, : 99 - 100
[43] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
Yang, Yi
Zhou, Huiyang
ACM SIGPLAN NOTICES, 2014, 49 (08) : 93 - 105
[44] Predicting loop termination to boost speculative thread-level parallelism in embedded applications
Islam, Mafijul Md.
19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2007, : 54 - 61
[45] OpenPro : A Dynamic Profiling Tool Set for Exploring Thread-Level Speculation Parallelism
Wang, Yaobin
An, Hong
Liang, Bo
Wang, Li
Guo, Rui
ICCEE 2008: PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, 2008, : 256 - +
[46] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
Yang, Yi
Li, Chao
Zhou, Huiyang
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 3 - 19
[47] Aggressive compiler optimization and parallelization with thread-level speculation
Chen, LL
Wu, YF
2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2003, : 607 - 614
[48] Dual-thread speculation: A simple approach to uncover thread-level parallelism on a simultaneous multithreaded processor
Warg, Fredrik
Stenstrom, Per
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2008, 36 (02) : 166 - 183
[49] Exploiting superword level parallelism with multimedia instruction sets
Larsen, S
Amarasinghe, S
ACM SIGPLAN NOTICES, 2000, 35 (05) : 145 - 156
[50] A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs
Yu, Yulong
Xiao, Weijun
He, Xubin
Guo, He
Wang, Yuxin
Chen, Xin
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 15 - 24

← 1 2 3 4 5 →