Hybrid Memory Buffer Microarchitecture for High-Radix Routers

被引:2
|
作者
Li, Cunlu [1 ,2 ]
Dong, Dezun [1 ,2 ]
Liao, Xiangke [1 ,2 ]
Kim, John [3 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Natl Lab Parallel & Distributed Proc, Changsha 410073, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Collaborat Innovat Ctr High Performance Comp, Changsha 410073, Peoples R China
[3] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
关键词
Random access memory; Switches; Organizations; Microarchitecture; Ports (computers); Magnetic tunneling; System-on-chip; Hierarchical router; STT-MRAM; high-radix router; ARCHITECTURE; PERFORMANCE; INTERCONNECT; ENERGY; CACHE;
D O I
10.1109/TC.2021.3076431
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Hierarchical high-radix router microarchitecture consisting of small SRAM-based intermediate buffers has been used in large-scale supercomputers interconnection networks. While hierarchical organization enables efficient scaling to higher switch port count, it requires intermediate buffers which can cause performance bottleneck. Shallow intermediate buffers can cause head-of-line blocking to create backpressure towards input buffers and reduce overall performance. Increasing intermediate buffer size overcomes this problem but becomes infeasible due to the large overhead. In this work, we propose to organise decentralized intermediate buffers as a centralized buffer and leverage alternate memory technology to increase its capacity. In particular, we exploit the high-density nature of Spin-Torque Transfer Magnetic RAM (STT-MRAM) to increase intermediate buffer depth while also providing near-zero leakage power. STT-MRAM has disadvantages such as higher write latency and higher write energy. To overcome these disadvantages, we propose DeepHiR, a novel deep hybrid buffer organization (STT-MRAM and SRAM) combined with a centralized buffer organization to provide high performance with minimal cost. Although the deep intermediate buffer provided by DeepHiR can effectively improve router performance, a large amount of input buffer will still cause a lot of hardware overhead. At the same time, deeper intermediate buffers also makes it take longer for the backpressure to propagate to the source node, thereby reducing the performance of DeepHiR. Therefore, we further propose ElasHiR, which leverages elastic input buffer design in the centralized row buffer to allow a part of the centralized row buffer to act as input buffer. ElasHiR adopts reduced input buffers and automatically determines the length of input buffer in the centralized row buffer. This design minimizes the buffer resource while achieving excellent efficiency. Evaluation results show that DeepHiR can achieve 56.7 percent performance improvement in packet latency under synthetic traffic, and the cost of energy and area is moderate. ElasHiR can reduce the input buffer by 93.8 percent with performance comparable to DeepHiR.
引用
收藏
页码:2888 / 2902
页数:15
相关论文
共 50 条
  • [31] High-radix modulo rn -: 1 Multipliers and Adders
    Kouretas, I
    Paliouras, V
    ICES 2002: 9TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-111, CONFERENCE PROCEEDINGS, 2002, : 561 - 564
  • [32] CNoC: High-Radix Clos Network-on-Chip
    Kao, Yu-Hsiang
    Yang, Ming
    Artan, N. Sertac
    Chao, H. Jonathan
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2011, 30 (12) : 1897 - 1910
  • [33] Dynamic Global Adaptive Routing in High-Radix Networks
    Kasan, Hans
    Kim, Gwangsun
    Yi, Yung
    Kim, John
    PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22), 2022, : 771 - 783
  • [34] HIGH-RADIX DIVISION AND SQUARE-ROOT WITH SPECULATION
    CORTADELLA, J
    LANG, T
    IEEE TRANSACTIONS ON COMPUTERS, 1994, 43 (08) : 919 - 931
  • [35] High-Radix Crossbar Switches Enabled by Proximity Communication
    Eberle, Hans
    Garcia, Pedro J.
    Flich, Jose
    Duato, Jose
    Drost, Robert
    Gura, Nils
    Hopkins, David
    Olesinski, Wladek
    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 230 - +
  • [36] Formalization and configuration methodology for high-radix combined switches
    Juan A. Villar
    Francisco J. Andújar
    Francisco J. Alfaro
    José L. Sánchez
    José Duato
    The Journal of Supercomputing, 2014, 69 : 1410 - 1444
  • [37] High-radix systolic modular multiplication on reconfigurable hardware
    McIvor, C
    McLoone, M
    McCanny, JV
    FPT 05: 2005 IEEE INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2005, : 13 - 18
  • [38] Novel high-radix residue number system architectures
    Paliouras, V
    Stouraitis, T
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-ANALOG AND DIGITAL SIGNAL PROCESSING, 2000, 47 (10): : 1059 - 1073
  • [39] High-radix logarithm with selection by rounding:: Algorithm and implementation
    Piñeiro, JA
    Ercegovac, M
    Bruguera, J
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2005, 40 (01): : 109 - 123
  • [40] On-line high-radix exponential with selection by rounding
    Piñeiro, JA
    Bruguera, JD
    Ercegovac, MD
    PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL IV: DIGITAL SIGNAL PROCESSING-COMPUTER AIDED NETWORK DESIGN-ADVANCED TECHNOLOGY, 2003, : 121 - 124