Computing En-Route for Near-Data Processing

被引:4
|
作者
Huang, Jiayi [1 ]
Majumder, Pritam [2 ]
Kim, Sungkeun [2 ]
Fulton, Troy [3 ]
Puli, Ramprakash Reddy [4 ]
Yum, Ki Hwan [2 ]
Kim, Eun Jung [2 ]
机构
[1] Univ Calif Santa Barbara, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA
[2] Texas A&M Univ, Dept Comp Sci & Engn, College Stn, TX 77845 USA
[3] Aspen Insights, Ft Worth, TX 76107 USA
[4] NVIDIA, Santa Clara, CA 95051 USA
关键词
Bandwidth; Parallel processing; Kernel; Optimization; Random access memory; Memory management; Fabrics; Memory network; data-flow; in-network computing; near-data processing; processing-in-memory;
D O I
10.1109/TC.2021.3063378
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The data explosion and faster data analysis demand have spawned emerging applications that operate over myriads of data and exhibit large memory footprints with low data reuse rate. Such characteristics lead to enormous data movements across the memory hierarchy and pose significant pressure on modern communication fabrics and memory subsystems. To mitigate the worsening gap between high processor computation density and deficient memory bandwidth, memory networks, and near-data processing techniques are proposed to keep improving system performance and energy efficiency. In this article, we propose Active-Routing, an in-network near-data processing architecture for data-flow execution, which enables computation en-route by exploiting patterns of aggregation over intermediate results. The proposed architecture leverages the massive memory cube- and vault-level parallelism as well as network concurrency to optimize the aggregation operations along a dynamically built Active-Routing Tree. It also introduces page granular computation offloading to amortize the offloading overhead and improve the throughput. Compared to the state-of-the-art processing-in-memory architecture, the evaluations show that the baseline Active-Routing can achieve up to 7x speedup with an average of 60 percent performance improvement, and reduce the energy-delay product by 80 percent across various benchmarks. Further optimizations with vault-level parallelism and page granular offloading can achieve an extra order of magnitude improvement.
引用
收藏
页码:906 / 921
页数:16
相关论文
共 50 条
  • [1] EN-ROUTE RADAR DATA-PROCESSING SYSTEM
    ITAKURA, Y
    AE, T
    TATSUTA, T
    SAITO, K
    SUGIMURA, T
    [J]. JAPAN TELECOMMUNICATIONS REVIEW, 1977, 19 (03): : 235 - 245
  • [2] NEAR-DATA PROCESSING
    Balasubramonian, Rajeev
    Grot, Boris
    [J]. IEEE MICRO, 2016, 36 (01) : 4 - 5
  • [3] Overcoming Challenges to Near-Data Processing
    Jayasena, Nuwan
    [J]. IEEE MICRO, 2016, 36 (01) : 8 - 9
  • [4] Near-Data Processing of Neural Networks
    Chen, Yunji
    Tao, Jinhua
    [J]. IEEE MICRO, 2016, 36 (01) : 9 - 10
  • [5] Optimizing Near-Data Processing for Spark
    Rachuri, Sri Pramodh
    Gantasala, Arun
    Emanuel, Prajeeth
    Gandhi, Anshul
    Foley, Robert
    Puhov, Peter
    Gkountouvas, Theodoros
    Lei, Hui
    [J]. 2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, : 636 - 646
  • [6] An Architecture for Near-Data Processing Systems
    Vermij, Erik
    Hagleitner, Christoph
    Fiorin, Leandro
    Jongerius, Rik
    van Lunteren, Jan
    Bertels, Koen
    [J]. PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 357 - 360
  • [7] JAFAR: Near-Data Processing for Databases
    Babarinsa, Oreoluwa
    Idreos, Stratos
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 2069 - 2070
  • [8] DELAYED EN-ROUTE
    BAHN, PG
    [J]. ARCHAEOLOGY, 1992, 44 (06) : 21 - 22
  • [9] Streaming Analytics with Adaptive Near-data Processing
    Sandur, Atul
    Park, ChanHo
    Volos, Stavros
    Agha, Gul
    Jeon, Myeongjae
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 563 - 566
  • [10] Advancing Database System Operators with Near-Data Processing
    dos Santos, Sairo R.
    Moreira, Francis B.
    Kepe, Tiago R.
    Alves, Marco A. Z.
    [J]. 30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 127 - 134