Hybrid Memory Buffer Microarchitecture for High-Radix Routers

被引:2
|
作者
Li, Cunlu [1 ,2 ]
Dong, Dezun [1 ,2 ]
Liao, Xiangke [1 ,2 ]
Kim, John [3 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Natl Lab Parallel & Distributed Proc, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Collaborat Innovat Ctr High Performance Comp, Changsha 410073, Peoples R China
[3] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
关键词
Random access memory; Switches; Organizations; Microarchitecture; Ports (computers); Magnetic tunneling; System-on-chip; Hierarchical router; STT-MRAM; high-radix router; ARCHITECTURE; PERFORMANCE; INTERCONNECT; ENERGY; CACHE;
D O I
10.1109/TC.2021.3076431
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical high-radix router microarchitecture consisting of small SRAM-based intermediate buffers has been used in large-scale supercomputers interconnection networks. While hierarchical organization enables efficient scaling to higher switch port count, it requires intermediate buffers which can cause performance bottleneck. Shallow intermediate buffers can cause head-of-line blocking to create backpressure towards input buffers and reduce overall performance. Increasing intermediate buffer size overcomes this problem but becomes infeasible due to the large overhead. In this work, we propose to organise decentralized intermediate buffers as a centralized buffer and leverage alternate memory technology to increase its capacity. In particular, we exploit the high-density nature of Spin-Torque Transfer Magnetic RAM (STT-MRAM) to increase intermediate buffer depth while also providing near-zero leakage power. STT-MRAM has disadvantages such as higher write latency and higher write energy. To overcome these disadvantages, we propose DeepHiR, a novel deep hybrid buffer organization (STT-MRAM and SRAM) combined with a centralized buffer organization to provide high performance with minimal cost. Although the deep intermediate buffer provided by DeepHiR can effectively improve router performance, a large amount of input buffer will still cause a lot of hardware overhead. At the same time, deeper intermediate buffers also makes it take longer for the backpressure to propagate to the source node, thereby reducing the performance of DeepHiR. Therefore, we further propose ElasHiR, which leverages elastic input buffer design in the centralized row buffer to allow a part of the centralized row buffer to act as input buffer. ElasHiR adopts reduced input buffers and automatically determines the length of input buffer in the centralized row buffer. This design minimizes the buffer resource while achieving excellent efficiency. Evaluation results show that DeepHiR can achieve 56.7 percent performance improvement in packet latency under synthetic traffic, and the cost of energy and area is moderate. ElasHiR can reduce the input buffer by 93.8 percent with performance comparable to DeepHiR.
引用
收藏
页码:2888 / 2902
页数:15
相关论文
共 50 条
  • [21] Optimizing the configuration of combined high-radix switches
    Juan A. Villar
    Francisco J. Andujar
    Francisco J. Alfaro
    Jose L. Sanchez
    Jose Duato
    The Journal of Supercomputing, 2015, 71 : 2614 - 2643
  • [22] High-radix iterative algorithm for powering computation
    Piñeiro, JA
    Ercegovac, MD
    Bruguera, JD
    16TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2003, : 204 - 211
  • [23] Scalable High-Radix Modular Crossbar Switches
    Cakir, Cagla
    Ho, Ron
    Lexau, Jon
    Mai, Ken
    2016 IEEE 24TH ANNUAL SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS (HOTI), 2016, : 37 - 44
  • [24] Pipelining high-radix SRT division algorithms
    Upadhyay, Saurabh
    Stine, James E.
    2007 50TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 2007, : 266 - 269
  • [25] HCORDIC: A high-radix adaptive CORDIC algorithm
    Elguibaly, F
    Sui, NT
    Rayhan, A
    CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 2000, 25 (04): : 149 - 154
  • [26] A modified high-radix scalable montgomery multiplier
    Fan, Yibo
    Zeng, Xiaoyang
    Yu, Yu
    Wang, Gang
    Zhang, Qianling
    2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 3382 - +
  • [27] Analysis of the tradeoffs for the implementation of a high-radix logarithm
    Piñeiro, JA
    Ercegovac, MD
    Bruguera, JD
    ICCD'2002: IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS, 2002, : 132 - 137
  • [28] An Efficient Label Routing on High-Radix Interconnection Networks
    Lei, Fei
    Dong, Dezun
    Liao, Xiangke
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 596 - 603
  • [29] MODULE COMPILER FOR HIGH-RADIX CCD-PLAS
    KERKHOFF, HG
    BUTLER, JT
    INTERNATIONAL JOURNAL OF ELECTRONICS, 1989, 67 (05) : 797 - 807
  • [30] High-radix cordic algorithms for VLSI signal processing
    Aoki, T
    Nogi, H
    Higuchi, T
    SIPS 97 - 1997 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 1997, : 183 - 192