DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations

被引：0

作者：

Soria-Pardos, Victor ^{[1
]}

Armejach, Adria ^{[2
]}

Muck, Tiago ^{[3
]}

Suarez Gracia, Dario ^{[4
]}

Joao, Jose A. ^{[3
]}

Rico, Alejandro ^{[5
]}

Moreto, Miquel ^{[2
]}

机构：

[1] Barcelona Supercomp Ctr, Barcelona, Spain

[2] Univ Politecn Cataluna, Barcelona Supercomp Ctr, Barcelona, Spain

[3] Arm, Austin, TX USA

[4] Univ Zaragoza, Zaragoza, Spain

[5] AMD, Austin, TX USA

来源：

PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023 | 2023年

关键词：

multi-core architectures; microarchitecture; atomic memory operations; data placement; BARRIER SYNCHRONIZATION; ARCHITECTURE; COMMUNICATION; SPLASH-2;

D O I：

10.1145/3579371.3589065

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

With increasing core counts in modern multi-core designs, the over-head of synchronization jeopardizes the scalability and efficiency of parallel applications. To mitigate these overheads, modern cachecoherent protocols offer support for Atomic Memory Operations (AMOs) that can be executed near-core (near) or remotely in the on-chip memory hierarchy (far). This paper evaluates current available static AMO execution policies implemented in multi-core Systems-on-Chip (SoC) designs, which select AMOs' execution placement (near or far) based on the cache block coherence state. We propose three static policies and show that the performance of static policies is application dependent. Moreover, we show that one of our proposed static policies outperforms currently available implementations. Furthermore, we propose DynAMO, a predictor that selects the best location to execute the AMOs. DynAMO identifies the different locality patterns to make informed decisions, improving AMO latency and increasing overall throughput. DynAMO outperforms the best-performing static policy and provides geometric mean speed-ups of 1.09x across all workloads and 1.31x on AMO-intensive applications with respect to executing all AMOs near.

引用

页码：420 / 432

页数：13

共 50 条

[31] Extracting Memory-Level Parallelism through Reconfigurable Hardware Traces
Lin, Mingjie
Cheng, Shaoyi
Wawrzynek, John
2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
[32] Simulation and Architecture Improvements of Atomic Operations on GPU Scratchpad Memory
van den Braak, Gert-Jan
Gomez-Luna, Juan
Corporaal, Henk
Gonzalez-Linares, Jose Maria
Guil, Nicolas
2013 IEEE 31ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2013, : 357 - 362
[33] Code placement for improving dynamic branch prediction accuracy
Jiménez, DA
ACM SIGPLAN NOTICES, 2005, 40 (06) : 107 - 116
[34] Improving the Robustness of Reservoir Operations with Stochastic Dynamic Programming
Kim, Gi Joo
Kim, Young-Oh
Reed, Patrick M.
JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT, 2021, 147 (07)
[35] DYNAMO - A PORTABLE TOOL FOR DYNAMIC LOAD BALANCING ON DISTRIBUTED-MEMORY MULTICOMPUTERS
TARNVIK, E
CONCURRENCY-PRACTICE AND EXPERIENCE, 1994, 6 (08): : 613 - 639
[36] BLPP: Improving the Performance of GPGPUs with Heterogeneous Memory through Bandwidth- and Latency-Aware Page Placement
Kim, Kyu Yeun
Baek, Woongki
2018 IEEE 36TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2018, : 358 - 365
[37] Exploiting mixed-mode parallelism for matrix operations on the HERA architecture through reconfiguration
Wang, X.
Ziavras, S. G.
IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 2006, 153 (04): : 249 - 260
[38] Dynamic application placement under service and memory constraints
Kimbrel, T
Steinder, M
Sviridenko, M
Tantawi, A
EXPERIMENTAL AND EFFICIENT ALGORITHMS, PROCEEDINGS, 2005, 3503 : 391 - 402
[39] Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
Łukasz Jarząbek
Paweł Czarnul
The Journal of Supercomputing, 2017, 73 : 5378 - 5401
[40] Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications
Jarzabek, Lukasz
Czarnul, Pawel
JOURNAL OF SUPERCOMPUTING, 2017, 73 (12): : 5378 - 5401

← 1 2 3 4 5 →