Architecture optimization for multimedia application exploiting data and thread-level parallelism

被引:3
|
作者
Limousin, C [1 ]
Sebot, J [1 ]
Vartanian, A [1 ]
Drach, N [1 ]
机构
[1] Univ Paris 11, LRI, F-91405 Orsay, France
关键词
SIMD; SMT; superscalar processor; memory hierarchy; multimedia;
D O I
10.1016/j.sysarc.2004.06.002
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The characteristics of multimedia applications when executed on general-purpose processors are not well understood. Such knowledge is extremely important in guiding the development of multimedia applications and the design of future processors. In this paper, we characterize and optimize the performance of multimedia applications on superscalar processor exploiting data-level parallelism and thread-level parallelism with SIMD (Single Instruction Multiple Data) and SMT (Simultaneous MultiThreading) capacities. We show that SMT and SIMD superscalar processor is suitable for 3D geometry application and we characterize the execution in term of memory hierarchy, which is the main bottleneck. The results show that the latency is not fully recovered by SMT; the use of second-level data prefetching does not succeed in increasing the performance. With detailed analysis, we show that this problem comes from a pollution of the instruction window by the threads experiencing second-level cache misses, thus reducing the window available for the other threads. We thus propose a hardware mechanism (an architecture optimization) to predict second-level misses and control this pollution. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:15 / 27
页数:13
相关论文
共 50 条
  • [31] Parallelization spectroscopy: Analysis of thread-level parallelism in HPC programs
    Kejariwal, Arun
    Cascaval, Calin
    ACM SIGPLAN Notices, 2009, 44 (04): : 293 - 294
  • [32] Relational profiling: Enabling thread-level parallelism in virtual machines
    Heil, T
    Smith, JE
    33RD ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE: MICRO-33 2000, PROCEEDINGS, 2000, : 281 - 290
  • [33] Exploiting Thread and Data Level Parallelism for Ultimate Parallel SystemC Simulation
    Schmidt, Tim
    Liu, Guantao
    Domer, Rainer
    PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [34] Balancing Thread-level and Task-level Parallelism for Data-Intensive Workloads on Clusters and Clouds
    Choudhury, Olivia
    Rajan, Dinesh
    Hazekamp, Nicholas
    Gesing, Sandra
    Thain, Douglas
    Emrich, Scott
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 390 - 393
  • [35] Accelerating Tetrahedral Interpolation with Data-Level and Thread-Level Parallel Optimization
    Ahn, Jaewoo
    Seong, Becksang
    Sung, Wonyong
    2011 10TH INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2011,
  • [36] An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs
    Xu, Shixiong
    Gregg, David
    2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 488 - 489
  • [37] Neither More Nor Less: Optimizing Thread-level Parallelism for GPGPUs
    Kayiran, Onur
    Jog, Adwait
    Kandemir, Mahmut T.
    Das, Chita R.
    2013 22ND INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2013, : 157 - 166
  • [38] Power-performance implications of thread-level parallelism on chip multiprocessors
    Li, J
    Martínez, JF
    ISPASS 2005: IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE, 2005, : 124 - 134
  • [39] Enhancing Thread-Level Parallelism in Asymmetric Multicores using Transparent Instruction Offloading
    Souza, Jeckson Dellagostin
    Manivannan, Madhavan
    Pericas, Miguel
    Schneider Beck, Antonio Carlos
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [40] Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor
    Fredrik Warg
    Per Stenstrom
    International Journal of Parallel Programming, 2008, 36 : 166 - 183