CSMT: Simultaneous Multithreading for Clustered VLIW Processors

被引：3

作者：

Gupta, Manoj ^{[1
]}

Sanchez, Fermin ^{[1
]}

Llosa, Josep ^{[1
]}

机构：

[1] Univ Politecn Cataluna, Dept Arquitectura Computadors, ES-08034 Barcelona, Spain

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2010年 / 59卷 / 03期

关键词：

ILP; VLIW architectures; clustered VLIW architectures; multithreaded processors; simultaneous multithreading; ARCHITECTURE; PERFORMANCE;

D O I：

10.1109/TC.2009.96

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Simultaneous MultiThreading (SMT) is a well-known technique that improves resource utilization by exploiting thread-level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures, which is contrary to the VLIW philosophy of hardware simplicity. In this paper, we propose Cluster-level Simultaneous MultiThreading (CSMT) to allow some degree of SMT in clustered VLIW processors with low hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster as the assignment unit. To minimize cluster conflicts between threads, a very simple hardware-based cluster renaming mechanism is proposed. The hardware required to implement CSMT is cheap, realistic, and practical for a clustered VLIW processor. An analysis of the hardware required to implement CSMT shows that it is quite scalable, with up to eight threads easily supported at low hardware cost. The experimental results show that CSMT significantly improves performance when compared with other multithreading approaches suited for VLIW. For instance, with four threads, CSMT shows an average speedup of 110 percent over a single-thread VLIW architecture and 40 percent over Interleaved MultiThreading (IMT). In some cases, speedup can be as high as 225 percent over single-thread architecture and 84 percent over IMT.

引用

下载

页码：385 / 399

页数：15

共 50 条

[31] Improving latency tolerance of network processors through simultaneous multithreading
Bo, L
Hong, A
Fang, L
Rui, G
ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2005, 3756 : 61 - 70
[32] PALF: compiler supports for irregular register files in clustered VLIW DSP processors
Lin, Yung-Chia
You, Yi-Ping
Lee, Jenq-Kuen
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (18): : 2391 - 2406
[33] Variable-based multi-module data caches for clustered VLIW processors
Gibert, E
Abella, J
Sánchez, J
Vera, X
González, A
PACT 2005: 14TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2005, : 207 - 217
[34] Flexible compiler-managed L0 buffers for clustered VLIW processors
Gibert, E
Sánchez, J
González, A
36TH INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 2003, : 315 - 325
[35] Simultaneous Floating-Point Sine and Cosine for VLIW Integer Processors
Jeannerod, Claude-Pierre
Jourdan-Lu, Jingyan
2012 IEEE 23RD INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2012, : 69 - 76
[36] Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors
Eyerman, Stijn
Eeckhout, Lieven
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2009, 6 (01)
[37] Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors
Nagpal, Rahul
Srikant, Y. N.
HIGH PERFORMANCE COMPUTING - HIPC 2006, PROCEEDINGS, 2006, 4297 : 497 - +
[38] An Efficient WCET-Aware Instruction Scheduling and Register Allocation Approach for Clustered VLIW Processors
Su, Xuesong
Wu, Hui
Xue, Jingling
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
[39] A survey of processors with explicit multithreading
Ungerer, T
Robic, B
Silc, J
ACM COMPUTING SURVEYS, 2003, 35 (01) : 29 - 63
[40] Heterogeneous clustered VLIW microarchitectures
Aleta, Alex
Codina, Josep M.
González, Antonio
Kaeli, David
CGO 2007: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2007, : 354 - +

← 1 2 3 4 5 →