DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations

被引:0
|
作者
Soria-Pardos, Victor [1 ]
Armejach, Adria [2 ]
Muck, Tiago [3 ]
Suarez Gracia, Dario [4 ]
Joao, Jose A. [3 ]
Rico, Alejandro [5 ]
Moreto, Miquel [2 ]
机构
[1] Barcelona Supercomp Ctr, Barcelona, Spain
[2] Univ Politecn Cataluna, Barcelona Supercomp Ctr, Barcelona, Spain
[3] Arm, Austin, TX USA
[4] Univ Zaragoza, Zaragoza, Spain
[5] AMD, Austin, TX USA
关键词
multi-core architectures; microarchitecture; atomic memory operations; data placement; BARRIER SYNCHRONIZATION; ARCHITECTURE; COMMUNICATION; SPLASH-2;
D O I
10.1145/3579371.3589065
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With increasing core counts in modern multi-core designs, the over-head of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cachecoherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies outperforms currently available implementations. Furthermore, we propose DynAMO, a predictor that selects the best location to execute the AMOs. DynAMO identifies the different locality patterns to make informed decisions, improving AMO latency and increasing overall throughput. DynAMO outperforms the best-performing static policy and provides geometric mean speed-ups of 1.09x across all workloads and 1.31x on AMO-intensive applications with respect to executing all AMOs near.
引用
收藏
页码:420 / 432
页数:13
相关论文
共 50 条
  • [1] Improving Parallelism in Hardware Transactional Memory
    Dice, Dave
    Herlihy, Maurice
    Kogan, Alex
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2018, 15 (01)
  • [2] Exploiting parallelism in memory operations for code optimization
    Paek, Y
    Choi, J
    Joung, J
    Lee, J
    Kim, S
    LANGUAGES AND COMPILERS FOR HIGH PERFORMANCE COMPUTING, 2005, 3602 : 132 - 148
  • [3] Improving drug discovery through parallelism
    Garcia, Jeronimo S.
    Puertas-Martin, Savins
    Redondo, Juana L.
    Moreno, Juan Jose
    Ortigosa, Pilar M.
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (09): : 9538 - 9557
  • [4] Improving drug discovery through parallelism
    Jerónimo S. García
    Savíns Puertas-Martín
    Juana L. Redondo
    Juan José Moreno
    Pilar M. Ortigosa
    The Journal of Supercomputing, 2023, 79 : 9538 - 9557
  • [5] Improving Thread-level Parallelism in GPUs Through Expanding Register File to Scratchpad Memory
    Yu, Chao
    Bai, Yuebin
    Sun, Qingxiao
    Yang, Hailong
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2019, 15 (04)
  • [6] Dynamic Memory Optimization and Parallelism Management for OpenCL
    Hsu, Chao-Hung
    Wu, I-Wei
    Shann, Jean Jyh-Jiun
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE, ELECTRONICS AND ELECTRICAL ENGINEERING (ISEEE), VOLS 1-3, 2014, : 780 - 784
  • [7] Improving SIMD Parallelism via Dynamic Binary Translation
    Hong, Ding-Yong
    Liu, Yu-Ping
    Fu, Sheng-Yu
    Wu, Jan-Jan
    Hsu, Wei-Chung
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2018, 17 (03)
  • [8] Improving Bus Operations through Integrated Dynamic Holding Control and Schedule Optimization
    Liu, Shuozhi
    Luo, Xia
    Jin, Peter J.
    JOURNAL OF ADVANCED TRANSPORTATION, 2018,
  • [9] Improving scalability of network emulation through parallelism and abstraction
    Kiddle, C
    Simmonds, R
    Unger, B
    38TH ANNUAL SIMULATION SYMPOSIUM, PROCEEDINGS, 2005, : 119 - 129
  • [10] CircusTent: A Benchmark Suite for Atomic Memory Operations
    Williams, Brody
    Leidel, John D.
    Wang, Xi
    Donofrio, David
    Chen, Yong
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, MEMSYS 2020, 2020, : 144 - 157