CSMT: Simultaneous Multithreading for Clustered VLIW Processors

被引:3
|
作者
Gupta, Manoj [1 ]
Sanchez, Fermin [1 ]
Llosa, Josep [1 ]
机构
[1] Univ Politecn Cataluna, Dept Arquitectura Computadors, ES-08034 Barcelona, Spain
关键词
ILP; VLIW architectures; clustered VLIW architectures; multithreaded processors; simultaneous multithreading; ARCHITECTURE; PERFORMANCE;
D O I
10.1109/TC.2009.96
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Simultaneous MultiThreading (SMT) is a well-known technique that improves resource utilization by exploiting thread-level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures, which is contrary to the VLIW philosophy of hardware simplicity. In this paper, we propose Cluster-level Simultaneous MultiThreading (CSMT) to allow some degree of SMT in clustered VLIW processors with low hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster as the assignment unit. To minimize cluster conflicts between threads, a very simple hardware-based cluster renaming mechanism is proposed. The hardware required to implement CSMT is cheap, realistic, and practical for a clustered VLIW processor. An analysis of the hardware required to implement CSMT shows that it is quite scalable, with up to eight threads easily supported at low hardware cost. The experimental results show that CSMT significantly improves performance when compared with other multithreading approaches suited for VLIW. For instance, with four threads, CSMT shows an average speedup of 110 percent over a single-thread VLIW architecture and 40 percent over Interleaved MultiThreading (IMT). In some cases, speedup can be as high as 225 percent over single-thread architecture and 84 percent over IMT.
引用
下载
收藏
页码:385 / 399
页数:15
相关论文
共 50 条
  • [31] Improving latency tolerance of network processors through simultaneous multithreading
    Bo, L
    Hong, A
    Fang, L
    Rui, G
    ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2005, 3756 : 61 - 70
  • [32] PALF: compiler supports for irregular register files in clustered VLIW DSP processors
    Lin, Yung-Chia
    You, Yi-Ping
    Lee, Jenq-Kuen
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (18): : 2391 - 2406
  • [33] Variable-based multi-module data caches for clustered VLIW processors
    Gibert, E
    Abella, J
    Sánchez, J
    Vera, X
    González, A
    PACT 2005: 14TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2005, : 207 - 217
  • [34] Flexible compiler-managed L0 buffers for clustered VLIW processors
    Gibert, E
    Sánchez, J
    González, A
    36TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 2003, : 315 - 325
  • [35] Simultaneous Floating-Point Sine and Cosine for VLIW Integer Processors
    Jeannerod, Claude-Pierre
    Jourdan-Lu, Jingyan
    2012 IEEE 23RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2012, : 69 - 76
  • [36] Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors
    Eyerman, Stijn
    Eeckhout, Lieven
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2009, 6 (01)
  • [37] Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors
    Nagpal, Rahul
    Srikant, Y. N.
    HIGH PERFORMANCE COMPUTING - HIPC 2006, PROCEEDINGS, 2006, 4297 : 497 - +
  • [38] An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors
    Su, Xuesong
    Wu, Hui
    Xue, Jingling
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
  • [39] A survey of processors with explicit multithreading
    Ungerer, T
    Robic, B
    Silc, J
    ACM COMPUTING SURVEYS, 2003, 35 (01) : 29 - 63
  • [40] Heterogeneous clustered VLIW microarchitectures
    Aleta, Alex
    Codina, Josep M.
    González, Antonio
    Kaeli, David
    CGO 2007: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2007, : 354 - +