SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations

被引:19
|
作者
Kal, Hongju [1 ]
Lee, Seokmin [1 ]
Ko, Gun [1 ]
Ro, Won Woo [1 ]
机构
[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul, South Korea
关键词
Recommendation System; Embedding Layer; Locality; Heterogeneous Memory; Near Memory Processing; LONG TAIL; DRAM;
D O I
10.1109/ISCA52012.2021.00059
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized recommendation systems have become a major AI application in modern data centers. The main challenges in processing personalized recommendation inferences are the large memory footprint and high bandwidth requirement of embedding layers. To overcome the capacity limit and bandwidth congestion of on-chip memory, near memory processing (NMP) can be a promising solution. Recent work on accelerating personalized recommendations proposes a DIMM-based NMP design to solve the bandwidth problem and increases memory capacity. The performance of NMP is determined by the internal bandwidth and the prior DIMM-based approach utilizes more DIMMs to achieve higher operation throughput. However, extending the number of DIMMs could eventually lead to significant power consumption due to inefficient scaling. We propose SPACE, a novel heterogeneous memory architecture, which is efficient in terms of performance and energy. SPACE exploits a compute-capable 3D-stacked DRAM with DIMMs for personalized recommendations. Prior to designing the proposed system, we give a quantitative analysis of the user/item interactions and define the two localities: gather locality and reduction locality. In gather operations, we find only a small proportion of items are highly-accessed by users, and we call this gather locality. Also, we define reduction locality as the reusability of the gathered items in reduction operations. Based on the gather locality, SPACE allocates highly-accessed embedding items to the 3D-stacked DRAM to achieve the maximum bandwidth. Subsequently, by exploiting reduction locality, we utilize the remaining space of the 3D-stacked DRAM to store and reuse repeated partial sums, thereby minimizing the required number of element-wise reduction operations. As a result, the evaluation shows that SPACE achieves 3.2x performance improvement and 56% energy saving over the previous DIMM-based NMPs leveraging 3D-stacked DRAM with a 1/8 size of DIMMs. Also, compared to the state-of-the-art DRAM cache designs with the same NMP configuration, SPACE achieves an average 32.7% of performance improvement.
引用
收藏
页码:679 / 691
页数:13
相关论文
共 50 条
  • [1] Locality-Aware Scheduling for Scalable Heterogeneous Environments
    Kamatar, Alok, V
    Friese, Ryan D.
    Gioiosa, Roberto
    [J]. PROCEEDINGS OF 2020 10TH IEEE/ACM INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS (ROSS 2020), 2020, : 50 - 58
  • [2] Locality-Aware Tail Node Embeddings on Homogeneous and Heterogeneous Networks
    Liu, Zemin
    Fang, Yuan
    Zhang, Wentao
    Zhang, Xinming
    Hoi, Steven C. H.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (06) : 2517 - 2532
  • [3] Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware Forwarding
    Song, Yang
    Lin, Bill
    [J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2020, 17 (01)
  • [4] InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing
    Baek, Daehyeon
    Hwang, Soojin
    Heo, Taekyung
    Kim, Daehoon
    Huh, Jaehyuk
    [J]. 30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021), 2021, : 116 - 128
  • [5] Locality-Aware Crowd Counting
    Zhou, Joey Tianyi
    Le Zhang
    Du Jiawei
    Xi Peng
    Fang, Zhiwen
    Zhe Xiao
    Zhu, Hongyuan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3602 - 3613
  • [6] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture
    Ahn, Junwhan
    Yoo, Sungjoo
    Mutlu, Onur
    Choi, Kiyoung
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 336 - 348
  • [7] Optimizing Locality-Aware Memory Management of Key-Value Caches
    Hu, Xiameng
    Wang, Xiaolin
    Zhou, Lan
    Luo, Yingwei
    Ding, Chen
    Jiang, Song
    Wang, Zhenlin
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (05) : 862 - 875
  • [8] Zeus: Locality-aware Distributed Transactions
    Katsarakis, Antonios
    Ma, Yijun
    Tan, Zhaowei
    Bainbridge, Andrew
    Balkwill, Matthew
    Dragojevic, Aleksandar
    Grot, Boris
    Radunovic, Bozidar
    Zhang, Yongguang
    [J]. PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 145 - 161
  • [9] Locality-Aware Mapping and Scheduling for Multicores
    Ding, Wei
    Zhang, Yuanrui
    Kandemir, Mahmut
    Srinivas, Jithendra
    Yedlapalli, Praveen
    [J]. PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 335 - 346
  • [10] Locality-Aware Memory Association for Multi-Target Worksharing in OpenMP
    Scogland, Thomas R. W.
    Feng, Wu-Chun
    [J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 515 - 516