Architecture optimization for multimedia application exploiting data and thread-level parallelism

被引:3
|
作者
Limousin, C [1 ]
Sebot, J [1 ]
Vartanian, A [1 ]
Drach, N [1 ]
机构
[1] Univ Paris 11, LRI, F-91405 Orsay, France
关键词
SIMD; SMT; superscalar processor; memory hierarchy; multimedia;
D O I
10.1016/j.sysarc.2004.06.002
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The characteristics of multimedia applications when executed on general-purpose processors are not well understood. Such knowledge is extremely important in guiding the development of multimedia applications and the design of future processors. In this paper, we characterize and optimize the performance of multimedia applications on superscalar processor exploiting data-level parallelism and thread-level parallelism with SIMD (Single Instruction Multiple Data) and SMT (Simultaneous MultiThreading) capacities. We show that SMT and SIMD superscalar processor is suitable for 3D geometry application and we characterize the execution in term of memory hierarchy, which is the main bottleneck. The results show that the latency is not fully recovered by SMT; the use of second-level data prefetching does not succeed in increasing the performance. With detailed analysis, we show that this problem comes from a pollution of the instruction window by the threads experiencing second-level cache misses, thus reducing the window available for the other threads. We thus propose a hardware mechanism (an architecture optimization) to predict second-level misses and control this pollution. (C) 2004 Elsevier B.V. All rights reserved.
引用
收藏
页码:15 / 27
页数:13
相关论文
共 50 条
  • [41] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
    Yi Yang
    Chao Li
    Huiyang Zhou
    Journal of Computer Science and Technology, 2015, 30 : 3 - 19
  • [42] Exploitation of Nested Thread-Level Speculative Parallelism on Multi-Core Systems
    Kejariwal, Arun
    Girkar, Milind
    Tian, Xinmin
    Saito, Hideki
    Nicolau, Alexandru
    Veidenbaum, Alexander V.
    Banerjee, Utpal
    Polychronopoulos, Constantine D.
    PROCEEDINGS OF THE 2010 COMPUTING FRONTIERS CONFERENCE (CF 2010), 2010, : 99 - 100
  • [43] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
    Yang, Yi
    Zhou, Huiyang
    ACM SIGPLAN NOTICES, 2014, 49 (08) : 93 - 105
  • [44] Predicting loop termination to boost speculative thread-level parallelism in embedded applications
    Islam, Mafijul Md.
    19TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, PROCEEDINGS, 2007, : 54 - 61
  • [45] OpenPro : A Dynamic Profiling Tool Set for Exploring Thread-Level Speculation Parallelism
    Wang, Yaobin
    An, Hong
    Liang, Bo
    Wang, Li
    Guo, Rui
    ICCEE 2008: PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, 2008, : 256 - +
  • [46] CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications
    Yang, Yi
    Li, Chao
    Zhou, Huiyang
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (01) : 3 - 19
  • [47] Aggressive compiler optimization and parallelization with thread-level speculation
    Chen, LL
    Wu, YF
    2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS, 2003, : 607 - 614
  • [48] Dual-thread speculation: A simple approach to uncover thread-level parallelism on a simultaneous multithreaded processor
    Warg, Fredrik
    Stenstrom, Per
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2008, 36 (02) : 166 - 183
  • [49] Exploiting superword level parallelism with multimedia instruction sets
    Larsen, S
    Amarasinghe, S
    ACM SIGPLAN NOTICES, 2000, 35 (05) : 145 - 156
  • [50] A Stall-Aware Warp Scheduling for Dynamically Optimizing Thread-level Parallelism in GPGPUs
    Yu, Yulong
    Xiao, Weijun
    He, Xubin
    Guo, He
    Wang, Yuxin
    Chen, Xin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 15 - 24