Architecture optimization for multimedia application exploiting data and thread-level parallelism

被引：3

作者：

Limousin, C ^{[1
]}

Sebot, J ^{[1
]}

Vartanian, A ^{[1
]}

Drach, N ^{[1
]}

机构：

[1] Univ Paris 11, LRI, F-91405 Orsay, France

来源：

JOURNAL OF SYSTEMS ARCHITECTURE | 2005年 / 51卷 / 01期

关键词：

SIMD; SMT; superscalar processor; memory hierarchy; multimedia;

D O I：

10.1016/j.sysarc.2004.06.002

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The characteristics of multimedia applications when executed on general-purpose processors are not well understood. Such knowledge is extremely important in guiding the development of multimedia applications and the design of future processors. In this paper, we characterize and optimize the performance of multimedia applications on superscalar processor exploiting data-level parallelism and thread-level parallelism with SIMD (Single Instruction Multiple Data) and SMT (Simultaneous MultiThreading) capacities. We show that SMT and SIMD superscalar processor is suitable for 3D geometry application and we characterize the execution in term of memory hierarchy, which is the main bottleneck. The results show that the latency is not fully recovered by SMT; the use of second-level data prefetching does not succeed in increasing the performance. With detailed analysis, we show that this problem comes from a pollution of the instruction window by the threads experiencing second-level cache misses, thus reducing the window available for the other threads. We thus propose a hardware mechanism (an architecture optimization) to predict second-level misses and control this pollution. (C) 2004 Elsevier B.V. All rights reserved.

引用

页码：15 / 27

页数：13

共 50 条

[31] Parallelization spectroscopy: Analysis of thread-level parallelism in HPC programs
Kejariwal, Arun
Cascaval, Calin
ACM SIGPLAN Notices, 2009, 44 (04): : 293 - 294
[32] Relational profiling: Enabling thread-level parallelism in virtual machines
Heil, T
Smith, JE
33RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-33 2000, PROCEEDINGS, 2000, : 281 - 290
[33] Exploiting Thread and Data Level Parallelism for Ultimate Parallel SystemC Simulation
Schmidt, Tim
Liu, Guantao
Domer, Rainer
PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
[34] Balancing Thread-level and Task-level Parallelism for Data-Intensive Workloads on Clusters and Clouds
Choudhury, Olivia
Rajan, Dinesh
Hazekamp, Nicholas
Gesing, Sandra
Thain, Douglas
Emrich, Scott
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 390 - 393
[35] Accelerating Tetrahedral Interpolation with Data-Level and Thread-Level Parallel Optimization
Ahn, Jaewoo
Seong, Becksang
Sung, Wonyong
2011 10TH INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2011,
[36] An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
Xu, Shixiong
Gregg, David
2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 488 - 489
[37] Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
Kayiran, Onur
Jog, Adwait
Kandemir, Mahmut T.
Das, Chita R.
2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 157 - 166
[38] Power-performance implications of thread-level parallelism on chip multiprocessors
Li, J
Martínez, JF
ISPASS 2005: IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, 2005, : 124 - 134
[39] Enhancing Thread-Level Parallelism in Asymmetric Multicores using Transparent Instruction Offloading
Souza, Jeckson Dellagostin
Manivannan, Madhavan
Pericas, Miguel
Schneider Beck, Antonio Carlos
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[40] Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor
Fredrik Warg
Per Stenstrom
International Journal of Parallel Programming, 2008, 36 : 166 - 183

← 1 2 3 4 5 →