Cache Line Aware Algorithm Design for Cache-Coherent Architectures

被引:9
|
作者
Ramos, Sabela [1 ]
Hoefler, Torsten [1 ]
机构
[1] Swiss Fed Inst Technol, Scalable Parallel Comp Lab, Dept Comp Sci, Zurich, Switzerland
关键词
Cache coherence; shared memory; communication algorithms; performance modeling; Xeon Phi; Sandy Bridge; MODEL;
D O I
10.1109/TPDS.2016.2516540
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The increase in the number of cores per processor and the complexity of memory hierarchies make cache coherence key for programmability of current shared memory systems. However, ignoring its detailed architectural characteristics can harm performance significantly. In order to assist performance-centric programming, we propose a methodology to allow semi-automatic performance tuning with the systematic translation from an algorithm to an analytic performance model for cache line transfers. For this, we design a simple interface for cache line aware optimization, a translation methodology, and a full performance model that exposes the block-based design of caches to middleware designers. We investigate two different architectures to show the applicability of our techniques and methods: the many-core accelerator Intel Xeon Phi and a multi-core processor with a NUMA configuration (Intel Sandy Bridge). We use mathematical optimization techniques to tune synchronization algorithms to the microarchitectures, identifying three techniques to design and optimize data transfers in our model: single-use, single-step broadcast, and private cache lines.
引用
收藏
页码:2824 / 2837
页数:14
相关论文
共 50 条
  • [1] Dynamic Verification of Memory Consistency in Cache-Coherent Multithreaded Computer Architectures
    Meixner, Albert
    Sorin, Daniel J.
    [J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2009, 6 (01) : 18 - 31
  • [2] Dynamic verification of memory consistency in cache-coherent multithreaded computer architectures
    Meixner, Albert
    Sorin, Daniel J.
    [J]. DSN 2006 INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS, PROCEEDINGS, 2006, : 73 - 82
  • [3] Impact of switch design on the application performance of cache-coherent multiprocessors
    Bhuyan, L
    Wang, H
    Iyer, R
    Kumar, A
    [J]. FIRST MERGED INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, 1998, : 466 - 474
  • [4] Scaling application performance on a cache-coherent multiprocessors
    Jiang, DM
    Singh, JP
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, 1999, : 305 - 316
  • [5] Modular Specification and Verification of a Cache-Coherent Interface
    McMillan, Kenneth
    [J]. PROCEEDINGS OF THE 2016 16TH CONFERENCE ON FORMAL METHODS IN COMPUTER-AIDED DESIGN (FMCAD 2016), 2016, : 109 - 116
  • [6] Accelerating Wait-Free Algorithms: Pragmatic Solutions on Cache-Coherent Multicore Architectures
    Wang, Junchang
    Jin, Qi
    Fu, Xiong
    Li, Yun
    Shi, Peichang
    [J]. IEEE ACCESS, 2019, 7 : 74653 - 74669
  • [7] CCNoC:Cache-Coherent Network on Chip for Chip Multiprocessors
    王惊雷
    薛一波
    王海霞
    李崇民
    汪东升
    [J]. Journal of Computer Science & Technology, 2010, 25 (02) : 257 - 266
  • [8] CCNoC: Cache-Coherent Network on Chip for Chip Multiprocessors
    Jing-Lei Wang
    Yi-Bo Xue
    Hai-Xia Wang
    Chong-Min Li
    Dong-Sheng Wang
    [J]. Journal of Computer Science and Technology, 2010, 25 : 257 - 266
  • [9] Cache-Coherent Accelerators for Persistent Memory Crash Consistency
    Bhardwaj, Ankit
    Thornley, Todd
    Pawar, Vinita
    Achermann, Reto
    Zellweger, Gerd
    Stutsman, Ryan
    [J]. PROCEEDINGS OF THE 2022 14TH ACM WORKSHOP ON HOT TOPICS IN STORAGE AND FILE SYSTEMS, HOTSTORAGE 2022, 2022, : 37 - 44
  • [10] A model of pipelined mutual exclusion on cache-coherent multiprocessors
    Takesue, M
    [J]. EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 917 - 922