Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy

被引:0
|
作者
Milidonis, Athanasios [1 ]
Alachiotis, Nikolaos [1 ]
Porpodas, Vasileios [1 ]
Michail, Harris [1 ]
Panagiotakopoulos, Georgios [1 ]
Kakarountas, Athanasios P. [1 ]
Goutis, Costas E. [1 ]
机构
[1] Univ Patras, VLSI Design Lab, Dept Elect & Comp Engn, Patras, Greece
关键词
Decoupled; Scratch pad;
D O I
10.1007/s11265-009-0393-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present an architecture of decoupled processors with a memory hierarchy consisting only of scratch-pad memories, and a main memory. This architecture exploits the more efficient pre-fetching of Decoupled processors, that make use of the parallelism between address computation and application data processing, which mainly exists in streaming applications. This benefit combined with the ability of scratch-pad memories to store data with no conflict misses and low energy per access contributes significantly for increasing the system's performance. The application code is split in two parallel programs the first runs on the Access processor and computes the addresses of the data in the memory hierarchy. The second processes the application data and runs on the Execute processor, a processor with a limited address space-just the register file addresses. Each transfer of any block in the memory hierarchy up to the Execute processor's register file is controlled by the Access processor and the DMA units. This strongly differentiates this architecture from traditional uniprocessors and existing decoupled processors with cache memory hierarchies. The architecture is compared in performance with uniprocessor architectures with (a) scratch-pad and (b) cache memory hierarchies and (c) the existing decoupled architectures, showing its higher normalized performance. The reason for this gain is the efficiency of data transferring that the scratch-pad memory hierarchy provides combined with the ability of the Decoupled processors to eliminate memory latency using memory management techniques for transferring data instead of fixed prefetching methods. Experimental results show that the performance is increased up to almost 2 times compared to uniprocessor architectures with scratch-pad and up to 3.7 times compared to the ones with cache. The proposed architecture achieves the above performance without having penalties in energy delay product costs.
引用
收藏
页码:281 / 296
页数:16
相关论文
共 42 条
  • [1] Decoupled Processors Architecture for Accelerating Data Intensive Applications using Scratch-Pad Memory Hierarchy
    Athanasios Milidonis
    Nikolaos Alachiotis
    Vasileios Porpodas
    Harris Michail
    Georgios Panagiotakopoulos
    Athanasios P. Kakarountas
    Costas E. Goutis
    [J]. Journal of Signal Processing Systems, 2010, 59 : 281 - 296
  • [2] A decoupled architecture of processors with scratch-pad memory hierarchy
    Milidonis, A.
    Alachiotis, N.
    Porpodas, V.
    Michail, H.
    Kakarountas, A. P.
    Goutis, C. E.
    [J]. 2007 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2007, : 612 - 617
  • [3] Scratch-Pad Memory Banking by Dynamic Programming for Embedded Data-Intensive Applications
    Balasa, Florin
    Abuaesh, Noha
    Luican, Ilie I.
    Zhu, Hongwei
    [J]. PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED 2015), 2015, : 485 - 489
  • [4] Low-power architecture with scratch-pad memory for accelerating embedded applications with run-time reuse
    Milidonis, A.
    Porpodas, V.
    Alachiotis, N.
    Kakarountas, A. P.
    Michail, H.
    Panagiotakopoulos, G.
    Goutis, C. E.
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2009, 3 (01): : 109 - 123
  • [5] Architecture Extensions for Efficient Management of Scratch-Pad Memory
    Busquets-Mataix, Jose V.
    Catala, Carlos
    Marti-Campoy, Antonio
    [J]. INTEGRATED CIRCUIT AND SYSTEM DESIGN: POWER AND TIMING MODELING, OPTIMIZATION, AND SIMULATION, 2011, 6951 : 43 - 52
  • [6] Automatic Analysis of Scratch-Pad Memory Code for Heterogeneous Multicore Processors
    Donaldson, Alastair F.
    Kroening, Daniel
    Ruemmer, Philipp
    [J]. TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, PROCEEDINGS, 2010, 6015 : 280 - 295
  • [7] Exploiting scratch-pad memory using Presburger formulas
    Kandemir, M
    Kadayif, I
    Sezer, U
    [J]. ISSS'01: 14TH INTERNATIONAL SYMPOSIUM ON SYSTEM SYNTHESIS, 2001, : 7 - 12
  • [8] Efficient utilization of Scratch-Pad memory in embedded processor applications
    Panda, PR
    Dutt, ND
    Nicolau, A
    [J]. EUROPEAN DESIGN & TEST CONFERENCE - ED&TC 97, PROCEEDINGS, 1997, : 7 - 11
  • [9] Optimal Data Placement for Memory Architectures with Scratch-Pad Memories
    Guo, Yibo
    Zhuge, Qingfeng
    Hu, Jingtong
    Sha, Edwin H. -M.
    [J]. TRUSTCOM 2011: 2011 INTERNATIONAL JOINT CONFERENCE OF IEEE TRUSTCOM-11/IEEE ICESS-11/FCST-11, 2011, : 1045 - 1050
  • [10] Scratch-pad memory allocation without compiler support for java applications
    Dept. of Electrical and Computer Engineering, University of Maryland, College Park, MD, United States
    [J]. ACM Special Interest Group on Design Automation; ACM Special Interest Group on Embedded Systems; ACM SIG on Microarchitectural Research and Processing, 1600, 85-94 (2007):