SPACE: Locality-Aware Processing in Heterogeneous Memory for Personalized Recommendations

被引：19

作者：

Kal, Hongju ^{[1
]}

Lee, Seokmin ^{[1
]}

Ko, Gun ^{[1
]}

Ro, Won Woo ^{[1
]}

机构：

[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul, South Korea

来源：

2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021) | 2021年

关键词：

Recommendation System; Embedding Layer; Locality; Heterogeneous Memory; Near Memory Processing; LONG TAIL; DRAM;

D O I：

10.1109/ISCA52012.2021.00059

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Personalized recommendation systems have become a major AI application in modern data centers. The main challenges in processing personalized recommendation inferences are the large memory footprint and high bandwidth requirement of embedding layers. To overcome the capacity limit and bandwidth congestion of on-chip memory, near memory processing (NMP) can be a promising solution. Recent work on accelerating personalized recommendations proposes a DIMM-based NMP design to solve the bandwidth problem and increases memory capacity. The performance of NMP is determined by the internal bandwidth and the prior DIMM-based approach utilizes more DIMMs to achieve higher operation throughput. However, extending the number of DIMMs could eventually lead to significant power consumption due to inefficient scaling. We propose SPACE, a novel heterogeneous memory architecture, which is efficient in terms of performance and energy. SPACE exploits a compute-capable 3D-stacked DRAM with DIMMs for personalized recommendations. Prior to designing the proposed system, we give a quantitative analysis of the user/item interactions and define the two localities: gather locality and reduction locality. In gather operations, we find only a small proportion of items are highly-accessed by users, and we call this gather locality. Also, we define reduction locality as the reusability of the gathered items in reduction operations. Based on the gather locality, SPACE allocates highly-accessed embedding items to the 3D-stacked DRAM to achieve the maximum bandwidth. Subsequently, by exploiting reduction locality, we utilize the remaining space of the 3D-stacked DRAM to store and reuse repeated partial sums, thereby minimizing the required number of element-wise reduction operations. As a result, the evaluation shows that SPACE achieves 3.2x performance improvement and 56% energy saving over the previous DIMM-based NMPs leveraging 3D-stacked DRAM with a 1/8 size of DIMMs. Also, compared to the state-of-the-art DRAM cache designs with the same NMP configuration, SPACE achieves an average 32.7% of performance improvement.

引用

页码：679 / 691

页数：13

共 50 条

[1] Locality-Aware Scheduling for Scalable Heterogeneous Environments
Kamatar, Alok, V
Friese, Ryan D.
Gioiosa, Roberto
[J]. PROCEEDINGS OF 2020 10TH IEEE/ACM INTERNATIONAL WORKSHOP ON RUNTIME AND OPERATING SYSTEMS FOR SUPERCOMPUTERS (ROSS 2020), 2020, : 50 - 58
[2] Locality-Aware Tail Node Embeddings on Homogeneous and Heterogeneous Networks
Liu, Zemin
Fang, Yuan
Zhang, Wentao
Zhang, Xinming
Hoi, Steven C. H.
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (06) : 2517 - 2532
[3] Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware Forwarding
Song, Yang
Lin, Bill
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2020, 17 (01)
[4] InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing
Baek, Daehyeon
Hwang, Soojin
Heo, Taekyung
Kim, Daehoon
Huh, Jaehyuk
[J]. 30TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT 2021), 2021, : 116 - 128
[5] Locality-Aware Crowd Counting
Zhou, Joey Tianyi
Le Zhang
Du Jiawei
Xi Peng
Fang, Zhiwen
Zhe Xiao
Zhu, Hongyuan
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3602 - 3613
[6] PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture
Ahn, Junwhan
Yoo, Sungjoo
Mutlu, Onur
Choi, Kiyoung
[J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 336 - 348
[7] Optimizing Locality-Aware Memory Management of Key-Value Caches
Hu, Xiameng
Wang, Xiaolin
Zhou, Lan
Luo, Yingwei
Ding, Chen
Jiang, Song
Wang, Zhenlin
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (05) : 862 - 875
[8] Zeus: Locality-aware Distributed Transactions
Katsarakis, Antonios
Ma, Yijun
Tan, Zhaowei
Bainbridge, Andrew
Balkwill, Matthew
Dragojevic, Aleksandar
Grot, Boris
Radunovic, Bozidar
Zhang, Yongguang
[J]. PROCEEDINGS OF THE SIXTEENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS (EUROSYS '21), 2021, : 145 - 161
[9] Locality-Aware Mapping and Scheduling for Multicores
Ding, Wei
Zhang, Yuanrui
Kandemir, Mahmut
Srinivas, Jithendra
Yedlapalli, Praveen
[J]. PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 335 - 346
[10] Locality-Aware Memory Association for Multi-Target Worksharing in OpenMP
Scogland, Thomas R. W.
Feng, Wu-Chun
[J]. PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'14), 2014, : 515 - 516

← 1 2 3 4 5 →