DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations

被引:0
|
作者
Soria-Pardos, Victor [1 ]
Armejach, Adria [2 ]
Muck, Tiago [3 ]
Suarez Gracia, Dario [4 ]
Joao, Jose A. [3 ]
Rico, Alejandro [5 ]
Moreto, Miquel [2 ]
机构
[1] Barcelona Supercomp Ctr, Barcelona, Spain
[2] Univ Politecn Cataluna, Barcelona Supercomp Ctr, Barcelona, Spain
[3] Arm, Austin, TX USA
[4] Univ Zaragoza, Zaragoza, Spain
[5] AMD, Austin, TX USA
关键词
multi-core architectures; microarchitecture; atomic memory operations; data placement; BARRIER SYNCHRONIZATION; ARCHITECTURE; COMMUNICATION; SPLASH-2;
D O I
10.1145/3579371.3589065
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With increasing core counts in modern multi-core designs, the over-head of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cachecoherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies outperforms currently available implementations. Furthermore, we propose DynAMO, a predictor that selects the best location to execute the AMOs. DynAMO identifies the different locality patterns to make informed decisions, improving AMO latency and increasing overall throughput. DynAMO outperforms the best-performing static policy and provides geometric mean speed-ups of 1.09x across all workloads and 1.31x on AMO-intensive applications with respect to executing all AMOs near.
引用
收藏
页码:420 / 432
页数:13
相关论文
共 50 条
  • [41] Improving In-Memory Database Operations with Acceleration DIMM (AxDIMM)
    Lee, Donghun
    Ahn, Minseon
    Kim, Jungmin
    Rebholz, Oliver
    So, Jinin
    Lee, Jong-Geon
    Cho, Jeonghyeon
    Thummala, Vishnu Charan
    Shankar, J. V. Ravi
    Upadhya, Sachin Suresh
    Khan, Mohammed Ibrahim
    Kim, Jin Hyun
    18TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE, DAMON 2022, 2022,
  • [42] Improving object cache performance through selective placement
    Hosseini-Khayat, S
    Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, 2006, : 262 - 265
  • [43] Improving Port Terminal Operations through Information Sharing
    Olesen, Peter Bjerg
    Dukovska-Popovska, Iskra
    Hvolby, Hans-Henrik
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS: COMPETITIVE MANUFACTURING FOR INNOVATIVE PRODUCTS AND SERVICES, AMPS 2012, PT II, 2013, 398 : 662 - 669
  • [44] Improving Forest Operations Management through Applied Research
    Brown, Mark
    Strandgard, Martin
    Acuna, Mauricio
    Walsh, Damian
    Mitchell, Rick
    CROATIAN JOURNAL OF FOREST ENGINEERING, 2011, 32 (02) : 471 - 480
  • [45] Improving Snowplowing Operations in Utah Through Optimization and Visualization
    Wang, Yinhu
    Chen, Ye
    Ryzhov, Ilya O.
    Liu, Xiaoyue Cathy
    Markovic, Nikola
    INFORMS JOURNAL ON APPLIED ANALYTICS, 2024,
  • [46] Improving Memory Performance in Reconfigurable Computing Architecture through Hardware-Assisted Dynamic Graph
    Yu, Bai
    Alawad, Mohammed
    Riera, Michael
    Lin, Mingjie
    2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
  • [47] Exploiting Irregular Memory Parallelism in Quasi-Stencils through Nonlinear Transformation
    Escobedo, Juan
    Lin, Mingjie
    2019 27TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2019, : 236 - 244
  • [48] ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor
    Kurth, Andreas
    Riedel, Samuel
    Zaruba, Florian
    Hoefler, Torsten
    Benini, Luca
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [49] CircusTent: A Tool for Measuring the Performance of Atomic Memory Operations on Emerging Architectures
    Williams, Brody
    Leidel, John D.
    Wang, Xi
    Donofrio, David
    Chen, Yong
    OPENSHMEM AND RELATED TECHNOLOGIES: OPENSHMEM IN THE ERA OF EXASCALE AND SMART NETWORKS, 2022, 13159 : 92 - 110
  • [50] Improving Learning Through Dynamic Assessment
    Lauchlan, Fraser
    EDUCATIONAL AND DEVELOPMENTAL PSYCHOLOGIST, 2012, 29 (02): : 95 - 106