Exploiting execution locality with a decoupled kilo-instruction processor

被引:0
|
作者
Pericas, Miquel [1 ,2 ]
Cristal, Adrian [2 ]
Gonzalez, Ruben [1 ]
Jimenez, Daniel A. [3 ]
Valero, Mateo [1 ,2 ]
机构
[1] Tech Univ Catalonia, Comp Architecture Dept, Jodi Girona 1-3,Modul D6 Campus Nord, Barcelona 08034, Spain
[2] BSC, ES-08034 Barcelona, Spain
[3] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
来源
HIGH-PERFORMANCE COMPUTING | 2008年 / 4759卷
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Overcoming increasing memory latency is one of the main problems that microprocessor designers have faced over the years. The two basic techniques introduced to mitigate latencies are caches and out-of-order execution. However, neither of these solutions is adequate- for hiding off-chip memory accesses in the order of 200 cycles or more. Theoretically, increasing the size of the instruction window would allow much longer latencies to be hidden. But scaling the structures to support thousands of in-flight instructions would be prohibitively expensive. However, the distribution of instruction issue times under the presence of L2 cache misses is highly correlated. This paper describes this phenomenon of Execution Locality and shows how it can be exploited with an inexpensive microarchitecture consisting of two linked cores. This Decoupled Kilo-Instruction Processor (D-KIP) is very effective in recovering lost potential performance. Extensive simulations show that speed-ups of up to 379% are possible for numerical benchmarks thanks to the exploitation of impressive degrees of Memory-Level Parallelism (MLP) and the execution of independent instructions in the shadow of L2 misses.
引用
收藏
页码:56 / +
页数:3
相关论文
共 23 条
  • [1] A decoupled KILO-instruction processor
    Pericas, Miquel
    Cristal, Adrian
    Gonzalez, Ruben
    Jimenez, Daniel A.
    Valero, Mateo
    [J]. TWELFTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2006, : 52 - +
  • [2] Kilo-instruction processors
    Cristal, A
    Ortega, D
    Llosa, J
    Valero, M
    [J]. HIGH PERFORMANCE COMPUTING, 2003, 2858 : 10 - 25
  • [3] Implementing kilo-instruction multiprocessors
    Vallejo, E
    Galluzzi, M
    Cristal, A
    Vallejo, F
    Beivide, R
    Smith, JE
    Valer, M
    Stenström, P
    [J]. INTERNATIONAL CONFERENCE ON PERVASIVE SERVICES 2005, PROCEEDINGS, 2005, : 325 - 336
  • [4] Kilo-instruction processors:: Overcoming the memory wall
    Cristall, A
    Santana, OJ
    Cazorla, F
    Galluzzi, M
    Ramírez, T
    Pericàs, M
    Valero, M
    [J]. IEEE MICRO, 2005, 25 (03) : 48 - 57
  • [5] Implicit transactional memory in kilo-instruction multiprocessors
    Galluzzi, Marco
    Vallejo, Enrique
    Cristal, Adrian
    Vallejo, Fernando
    Beivide, Ramon
    Stenstroem, Per
    Smith, James E.
    Valero, Mateo
    [J]. ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, PROCEEDINGS, 2007, 4697 : 339 - +
  • [6] The instruction execution mechanism for responsive multithreaded processor
    Itou, T
    Yamasaki, N
    [J]. COMPUTERS AND THEIR APPLICATIONS, 2004, : 252 - 255
  • [7] EXPLOITING INSTRUCTION REUSE TO IMPROVE THE PERFORMANCE OF DUAL INSTRUCTION EXECUTION
    Pillai, Abhishek
    Zhang, Wei
    [J]. JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2011, 20 (05) : 899 - 913
  • [8] Exploiting temporal locality using a dependence driven execution
    Vajracharya, S
    Grunwald, D
    [J]. INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-III, PROCEEDINGS, 1997, : 275 - 284
  • [9] DECOUPLED COMPRESSED CACHE: EXPLOITING SPATIAL LOCALITY FOR ENERGY OPTIMIZATION
    Sardashti, Somayeh
    Wood, David A.
    [J]. IEEE MICRO, 2014, 34 (03) : 91 - 99
  • [10] Exploiting procedure level locality to reduce instruction cache misses
    Batchu, RV
    Jiménez, DA
    [J]. EIGHTH WORKSHOP ON INTERACTION BETWEEN COMPILERS AND COMPUTER ARCHITECTURES, PROCEEDINGS, 2004, : 75 - 84