Cache Line Aware Algorithm Design for Cache-Coherent Architectures

被引：9

作者：

Ramos, Sabela ^{[1
]}

Hoefler, Torsten ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Scalable Parallel Comp Lab, Dept Comp Sci, Zurich, Switzerland

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2016年 / 27卷 / 10期

关键词：

Cache coherence; shared memory; communication algorithms; performance modeling; Xeon Phi; Sandy Bridge; MODEL;

D O I：

10.1109/TPDS.2016.2516540

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The increase in the number of cores per processor and the complexity of memory hierarchies make cache coherence key for programmability of current shared memory systems. However, ignoring its detailed architectural characteristics can harm performance significantly. In order to assist performance-centric programming, we propose a methodology to allow semi-automatic performance tuning with the systematic translation from an algorithm to an analytic performance model for cache line transfers. For this, we design a simple interface for cache line aware optimization, a translation methodology, and a full performance model that exposes the block-based design of caches to middleware designers. We investigate two different architectures to show the applicability of our techniques and methods: the many-core accelerator Intel Xeon Phi and a multi-core processor with a NUMA configuration (Intel Sandy Bridge). We use mathematical optimization techniques to tune synchronization algorithms to the microarchitectures, identifying three techniques to design and optimize data transfers in our model: single-use, single-step broadcast, and private cache lines.

引用

页码：2824 / 2837

页数：14

共 50 条

[1] Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures
Meixner, Albert
Sorin, Daniel J.
[J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2009, 6 (01) : 18 - 31
[2] Dynamic verification of memory consistency in cache-coherent multithreaded computer architectures
Meixner, Albert
Sorin, Daniel J.
[J]. DSN 2006 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2006, : 73 - 82
[3] Impact of switch design on the application performance of cache-coherent multiprocessors
Bhuyan, L
Wang, H
Iyer, R
Kumar, A
[J]. FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 466 - 474
[4] Scaling application performance on a cache-coherent multiprocessors
Jiang, DM
Singh, JP
[J]. PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 1999, : 305 - 316
[5] Modular Specification and Verification of a Cache-Coherent Interface
McMillan, Kenneth
[J]. PROCEEDINGS OF THE 2016 16TH CONFERENCE ON FORMAL METHODS IN COMPUTER-AIDED DESIGN (FMCAD 2016), 2016, : 109 - 116
[6] Accelerating Wait-Free Algorithms: Pragmatic Solutions on Cache-Coherent Multicore Architectures
Wang, Junchang
Jin, Qi
Fu, Xiong
Li, Yun
Shi, Peichang
[J]. IEEE ACCESS, 2019, 7 : 74653 - 74669
[7] CCNoC:Cache-Coherent Network on Chip for Chip Multiprocessors
王惊雷
薛一波
王海霞
李崇民
汪东升
[J]. Journal of Computer Science & Technology, 2010, 25 (02) : 257 - 266
[8] CCNoC: Cache-Coherent Network on Chip for Chip Multiprocessors
Jing-Lei Wang
Yi-Bo Xue
Hai-Xia Wang
Chong-Min Li
Dong-Sheng Wang
[J]. Journal of Computer Science and Technology, 2010, 25 : 257 - 266
[9] Cache-Coherent Accelerators for Persistent Memory Crash Consistency
Bhardwaj, Ankit
Thornley, Todd
Pawar, Vinita
Achermann, Reto
Zellweger, Gerd
Stutsman, Ryan
[J]. PROCEEDINGS OF THE 2022 14TH ACM WORKSHOP ON HOT TOPICS IN STORAGE AND FILE SYSTEMS, HOTSTORAGE 2022, 2022, : 37 - 44
[10] A model of pipelined mutual exclusion on cache-coherent multiprocessors
Takesue, M
[J]. EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 917 - 922

← 1 2 3 4 5 →