A Compiler for Automatic Selection of Suitable Processing-in-Memory Instructions

被引:0
|
作者
Ahmed, Hameeza [1 ]
Santos, Paulo C. [2 ]
Lima, Joao P. C. [2 ]
Moura, Rafael F. [2 ]
Alves, Marco A. Z. [3 ]
Beck, Antonio C. S. [2 ]
Carro, Luigi [2 ]
机构
[1] NED Univ, Dept Comp & Informat Syst Engn, Karachi, Pakistan
[2] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[3] Univ Fed Parana, Dept Informat, Curitiba, Parana, Brazil
关键词
Compiler; Processing in Memory; Near-data computing; Vector instructions; SIMD; 3D-Stacked memories;
D O I
10.23919/date.2019.8714956
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although not a new technique, due to the advent of 3D-stacked technologies, the integration of large memories and logic circuitry able to compute large amount of data has revived the Processing-in-Memory (PIM) techniques. PIM is a technique to increase performance while reducing energy consumption when dealing with large amounts of data. Despite several designs of PIM are available in the literature, their effective implementation still burdens the programmer. Also, various PIM instances are required to take advantage of the internal 3D-stacked memories, which further increases the challenges faced by the programmers. In this way, this work presents the Processing-In-Memory cOmpiler (PRIMO). Our compiler is able to efficiently exploit large vector units on a PIM architecture, directly from the original code. PRIMO is able to automatically select suitable PIM operations, allowing its automatic offloading. Moreover, PRIMO concerns about several PIM instances, selecting the most suitable instance while reduces internal communication between different PIM units. The compilation results of different benchmarks depict how PRIMO is able to exploit large vectors, while achieving a near-optimal performance when compared to the ideal execution for the case study PIM. PRIMO allows a speedup of 38x for specific kernels, while on average achieves 11.8x for a set of benchmarks from PolyBench Suite.
引用
收藏
页码:564 / 569
页数:6
相关论文
共 50 条
  • [21] SPIMulator: A Spintronic Processing-in-memory Simulator for Racetracks
    Bera, Pavia
    Cahoon, Stephen
    Bhanja, Sanjukta
    Jones, Alex
    ACM Transactions on Embedded Computing Systems, 2024, 23 (06)
  • [22] Optimal Data Allocation for Graph Processing in Processing-in-Memory Systems
    Li, Zerun
    Chen, Xiaoming
    Han, Yinhe
    27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 238 - 243
  • [23] A Survey of Resource Management for Processing-In-Memory and Near-Memory Processing Architectures
    Khan, Kamil
    Pasricha, Sudeep
    Kim, Ryan Gary
    JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS, 2020, 10 (04) : 1 - 31
  • [24] PIMSim: A Flexible and Detailed Processing-in-Memory Simulator
    Xu, Sheng
    Chen, Xiaoming
    Wang, Ying
    Han, Yinhe
    Qian, Xuehai
    Li, Xiaowei
    IEEE COMPUTER ARCHITECTURE LETTERS, 2019, 18 (01) : 6 - 9
  • [25] On Consistency for Bulk-Bitwise Processing-in-Memory
    Perach, Ben
    Ronen, Ronny
    Kvatinsky, Shahar
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 705 - 717
  • [26] Resistive GP-SIMD Processing-In-Memory
    Morad, Amir
    Yavits, Leonid
    Kvatinsky, Shahar
    Ginosar, Ran
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 12 (04)
  • [27] Combinators and processing-in-memory: An unconventional basis for avoiding the memory wall
    Narayanaswamy, L
    Kogge, PM
    UNCONVENTIONAL MODELS OF COMPUTATION, 1998, : 293 - 308
  • [28] Volatile and Nonvolatile Memory Devices for Neuromorphic and Processing-in-memory Applications
    Cho, Seongjae
    JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, 2022, 22 (01) : 30 - 46
  • [29] A survey on processing-in-memory techniques: Advances and challenges
    Asifuzzaman, Kazi
    Miniskar, Narasinga Rao
    Young, Aaron R.
    Liu, Frank
    Vetter, Jeffrey S.
    Memories - Materials, Devices, Circuits and Systems, 2023, 4
  • [30] Active Memory Cube: A processing-in-memory architecture for exascale systems
    Nair, R.
    Antao, S. F.
    Bertolli, C.
    Bose, P.
    Brunheroto, J. R.
    Chen, T.
    Cher, C. -Y.
    Costa, C. H. A.
    Doi, J.
    Evangelinos, C.
    Fleischer, B. M.
    Fox, T. W.
    Gallo, D. S.
    Grinberg, L.
    Gunnels, J. A.
    Jacob, A. C.
    Jacob, P.
    Jacobson, H. M.
    Karkhanis, T.
    Kim, C.
    Moreno, J. H.
    O'Brien, J. K.
    Ohmacht, M.
    Park, Y.
    Prener, D. A.
    Rosenburg, B. S.
    Ryu, K. D.
    Sallenave, O.
    Serrano, M. J.
    Siegl, P. D. M.
    Sugavanam, K.
    Sura, Z.
    IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2015, 59 (2-3)