Instruction scheduling for a clustered VLIW processor with a word-interleaved cache

被引:0
|
作者
Gibert, Enric
Sanchez, Jesus
Gonzalez, Antonio
机构
[1] Univ Politecn Cataluna, Dept Arquitectura Computadors, ES-08034 Barcelona, Spain
[2] Univ Politecn Cataluna, Intel Labs, Intel Barcelona Res Ctr, Barcelona, Spain
来源
关键词
instruction scheduling; modulo scheduling; clustered VLIW processor; distributed cache; partitioned cache;
D O I
10.1002/cpe.1013
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Clustering is a common technique to overcome the wire delay problem incurred by the evolution of technology. Fully distributed architectures, where the register file, the functional units and the data cache are partitioned, are particularly effective to deal with these constraints and moreover they are very scalable. In this paper, effective instruction scheduling techniques for a word-interleaved cache clustered VLIW processor are presented. Such scheduling techniques rely on (i) loop unrolling and variable alignment to increase the fraction of local accesses, (ii) a latency assignment process to schedule memory instructions with an appropriate latency, and (iii) different heuristics to assign memory instructions to clusters. Memory consistency is guaranteed by constraining the assignment of memory instructions to clusters. In addition, the use of Attraction Buffers is also introduced. An Attraction Buffer is a hardware mechanism that allows some data replication in order to increase the number of local accesses and, in consequence, reduces stall time. Performance results for the Mediabench benchmark suite demonstrate the effectiveness of the presented techniques and mechanisms. The number of local accesses is increased by more than 25% by using the mentioned scheduling techniques, while stall time is reduced by more than 30% when Attraction Buffers are used. Finally, IPC results for such an architecture are 10% and 5% better compared to those of a clustered VLIW processor with a centralized/unified data cache depending on the scheduling heuristic, respectively. Copyright (c) 2006 John Wiley & Sons, Ltd.
引用
收藏
页码:1391 / 1411
页数:21
相关论文
共 50 条
  • [1] Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
    Gibert, E
    Sánchez, J
    González, A
    [J]. 35TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO-35), PROCEEDINGS, 2002, : 123 - 133
  • [2] Instruction scheduling for clustered VLIW architectures
    Sánchez, J
    González, A
    [J]. 13TH INTERNATIONAL SYMPOSIUM ON SYSTEM SYNTHESIS, PROCEEDINGS, 2000, : 41 - 46
  • [3] Instruction scheduling for clustered VLIW DSPs
    Leupers, R
    [J]. 2000 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2000, : 291 - 300
  • [4] Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache
    Gibert, E
    Sánchez, J
    González, A
    [J]. CGO 2003: INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2003, : 193 - 203
  • [5] An Efficient Heuristic for Instruction Scheduling on Clustered VLIW Processors
    Zhang, Xuemeng
    Wu, Hui
    Xue, Jingling
    [J]. PROCEEDINGS OF THE PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS (CASES '11), 2011, : 35 - 44
  • [6] Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors
    Porpodas, Vasileios
    Cintra, Marcelo
    [J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2013, 2014, 8664 : 275 - 291
  • [7] Simultaneous Reconfiguration of Issue-width and Instruction Cache for a VLIW Processor
    Anjam, Fakhar
    Wong, Stephan
    Carro, Luigi
    Nazar, Gabriel L.
    Rutzig, Mateus B.
    [J]. 2012 INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS (SAMOS): ARCHITECTURES, MODELING AND SIMULATION, 2012, : 183 - 192
  • [8] Instruction scheduling with k-successor tree for clustered VLIW processors
    Zhang, Xuemeng
    Wu, Hui
    Xue, Jingling
    [J]. DESIGN AUTOMATION FOR EMBEDDED SYSTEMS, 2013, 17 (02) : 439 - 458
  • [9] Instruction scheduling with k-successor tree for clustered VLIW processors
    Xuemeng Zhang
    Hui Wu
    Jingling Xue
    [J]. Design Automation for Embedded Systems, 2013, 17 : 439 - 458
  • [10] Instruction Decompressor Design for a VLIW Processor
    Buzdar, Abdul Rehman
    Sun, Liguo
    Latif, Azhar
    Buzdar, Abdullah
    [J]. INFORMACIJE MIDEM-JOURNAL OF MICROELECTRONICS ELECTRONIC COMPONENTS AND MATERIALS, 2015, 45 (04): : 225 - 236