Hybrid Memory Buffer Microarchitecture for High-Radix Routers

被引:2
|
作者
Li, Cunlu [1 ,2 ]
Dong, Dezun [1 ,2 ]
Liao, Xiangke [1 ,2 ]
Kim, John [3 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Natl Lab Parallel & Distributed Proc, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Collaborat Innovat Ctr High Performance Comp, Changsha 410073, Peoples R China
[3] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
关键词
Random access memory; Switches; Organizations; Microarchitecture; Ports (computers); Magnetic tunneling; System-on-chip; Hierarchical router; STT-MRAM; high-radix router; ARCHITECTURE; PERFORMANCE; INTERCONNECT; ENERGY; CACHE;
D O I
10.1109/TC.2021.3076431
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical high-radix router microarchitecture consisting of small SRAM-based intermediate buffers has been used in large-scale supercomputers interconnection networks. While hierarchical organization enables efficient scaling to higher switch port count, it requires intermediate buffers which can cause performance bottleneck. Shallow intermediate buffers can cause head-of-line blocking to create backpressure towards input buffers and reduce overall performance. Increasing intermediate buffer size overcomes this problem but becomes infeasible due to the large overhead. In this work, we propose to organise decentralized intermediate buffers as a centralized buffer and leverage alternate memory technology to increase its capacity. In particular, we exploit the high-density nature of Spin-Torque Transfer Magnetic RAM (STT-MRAM) to increase intermediate buffer depth while also providing near-zero leakage power. STT-MRAM has disadvantages such as higher write latency and higher write energy. To overcome these disadvantages, we propose DeepHiR, a novel deep hybrid buffer organization (STT-MRAM and SRAM) combined with a centralized buffer organization to provide high performance with minimal cost. Although the deep intermediate buffer provided by DeepHiR can effectively improve router performance, a large amount of input buffer will still cause a lot of hardware overhead. At the same time, deeper intermediate buffers also makes it take longer for the backpressure to propagate to the source node, thereby reducing the performance of DeepHiR. Therefore, we further propose ElasHiR, which leverages elastic input buffer design in the centralized row buffer to allow a part of the centralized row buffer to act as input buffer. ElasHiR adopts reduced input buffers and automatically determines the length of input buffer in the centralized row buffer. This design minimizes the buffer resource while achieving excellent efficiency. Evaluation results show that DeepHiR can achieve 56.7 percent performance improvement in packet latency under synthetic traffic, and the cost of energy and area is moderate. ElasHiR can reduce the input buffer by 93.8 percent with performance comparable to DeepHiR.
引用
收藏
页码:2888 / 2902
页数:15
相关论文
共 50 条
  • [41] High-Radix Logarithm with Selection by Rounding: Algorithm and Implementation
    J.-A. Piñeiro
    M. D. Ercegovac
    J. D. Bruguera
    Journal of VLSI signal processing systems for signal, image and video technology, 2005, 40 : 109 - 123
  • [42] Obtaining the optimal configuration of high-radix Combined switches
    Villar, Juan A.
    Andujar, Francisco J.
    Sanchez, Jose L.
    Alfaro, Francisco J.
    Gamez, Jose A.
    Duato, Jose
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (09) : 1239 - 1250
  • [43] Formalization and configuration methodology for high-radix combined switches
    Villar, Juan A.
    Andujar, Francisco J.
    Alfaro, Francisco J.
    Sanchez, Jose L.
    Duato, Jose
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (03): : 1410 - 1444
  • [44] High-radix parallel dividers for VLSI signal processing
    Aoki, T
    Tokoyo, H
    Higuchi, T
    VLSI SIGNAL PROCESSING, IX, 1996, : 83 - 92
  • [45] Design, Evaluation and Application of Approximate High-Radix Dividers
    Chen, Linbin
    Han, Jie
    Liu, Weiqiang
    Montuschi, Paolo
    Lombardi, Fabrizio
    IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS, 2018, 4 (03): : 299 - 312
  • [46] A Low-Complexity High-Radix RNS Multiplier
    Kouretas, Ioannis
    Paliouras, Vassilis
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2009, 56 (11) : 2449 - 2462
  • [47] High-radix montgomery modular exponentiation on reconfigurable hardware
    Blum, T
    Paar, C
    IEEE TRANSACTIONS ON COMPUTERS, 2001, 50 (07) : 759 - 764
  • [48] Reviewing High-Radix Signed-Digit Adders
    Kornerup, Peter
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (05) : 1502 - 1505
  • [49] HIGH-RADIX AND BIT RECODING TECHNIQUES FOR MODULAR EXPONENTIATION
    KOC, CK
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1991, 40 (3-4) : 139 - 156
  • [50] Systematic design of high-radix Montgomery multipliers for RSA processors
    Miyamoto, Atsushi
    Homma, Naofumi
    Aoki, Takafumi
    Satoh, Akashi
    2008 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2008, : 416 - +