CSMT: Simultaneous Multithreading for Clustered VLIW Processors

被引:3
|
作者
Gupta, Manoj [1 ]
Sanchez, Fermin [1 ]
Llosa, Josep [1 ]
机构
[1] Univ Politecn Cataluna, Dept Arquitectura Computadors, ES-08034 Barcelona, Spain
关键词
ILP; VLIW architectures; clustered VLIW architectures; multithreaded processors; simultaneous multithreading; ARCHITECTURE; PERFORMANCE;
D O I
10.1109/TC.2009.96
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Simultaneous MultiThreading (SMT) is a well-known technique that improves resource utilization by exploiting thread-level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures, which is contrary to the VLIW philosophy of hardware simplicity. In this paper, we propose Cluster-level Simultaneous MultiThreading (CSMT) to allow some degree of SMT in clustered VLIW processors with low hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster as the assignment unit. To minimize cluster conflicts between threads, a very simple hardware-based cluster renaming mechanism is proposed. The hardware required to implement CSMT is cheap, realistic, and practical for a clustered VLIW processor. An analysis of the hardware required to implement CSMT shows that it is quite scalable, with up to eight threads easily supported at low hardware cost. The experimental results show that CSMT significantly improves performance when compared with other multithreading approaches suited for VLIW. For instance, with four threads, CSMT shows an average speedup of 110 percent over a single-thread VLIW architecture and 40 percent over Interleaved MultiThreading (IMT). In some cases, speedup can be as high as 225 percent over single-thread architecture and 84 percent over IMT.
引用
下载
收藏
页码:385 / 399
页数:15
相关论文
共 50 条
  • [21] An effective software pipelining algorithm for clustered embedded VLIW processors
    Akturan, C
    Jacome, MF
    DESIGN AUTOMATION FOR EMBEDDED SYSTEMS, 2002, 7 (1-2) : 113 - 136
  • [22] Methods for modeling resource contention on simultaneous multithreading processors
    Moseley, T
    Kihm, JL
    Connors, DA
    Grunwald, D
    2005 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS, PROCEEDINGS, 2005, : 373 - 380
  • [23] Simultaneous multithreading: A platform for next-generation processors
    Eggers, SJ
    Emer, JS
    Levy, HM
    Lo, JL
    Stamm, RL
    Tullsen, DM
    IEEE MICRO, 1997, 17 (05) : 12 - 19
  • [24] Instruction scheduling with k-successor tree for clustered VLIW processors
    Zhang, Xuemeng
    Wu, Hui
    Xue, Jingling
    DESIGN AUTOMATION FOR EMBEDDED SYSTEMS, 2013, 17 (02) : 439 - 458
  • [25] A graph matching based integrated scheduling framework for clustered VLIW processors
    Nagpal, R
    Srikant, YN
    2004 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS, 2004, : 530 - 537
  • [26] Performance and power evaluation of clustered VLIW processors with wide functional units
    Pericàs, M
    Ayguadé, E
    Zalamea, J
    Llosa, J
    Valero, M
    COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, 2004, 3133 : 88 - 97
  • [27] Clustered loop buffer organization for low energy VLIW embedded processors
    Jayapala, M
    Barat, F
    Vander Aa, T
    Catthoor, F
    Corporaal, H
    Deconinck, G
    IEEE TRANSACTIONS ON COMPUTERS, 2005, 54 (06) : 672 - 683
  • [28] Instruction scheduling with k-successor tree for clustered VLIW processors
    Xuemeng Zhang
    Hui Wu
    Jingling Xue
    Design Automation for Embedded Systems, 2013, 17 : 439 - 458
  • [29] Further specialization of clustered VLIW processors:: A MAP decoder for software defined radio
    Ituero, Pablo
    Lopez-Vallejo, Marisa
    ETRI JOURNAL, 2008, 30 (01) : 113 - 128
  • [30] A Study of Improving Fetch and Execute Engine for Simultaneous Multithreading Processors
    Yang, Shih-Hung
    Shieh, Jong-Jiann
    WMSCI 2008: 12TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS, 2008, : 79 - 84