Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling

被引:7
|
作者
Caheny, Paul [1 ,3 ]
Casas, Marc [1 ,3 ]
Moreto, Miguel [1 ,3 ]
Gloaguen, Herve [2 ]
Saintes, Maxime [2 ]
Ayguade, Eduard [1 ,3 ]
Labarta, Jesus [1 ,3 ]
Valero, Mateo [1 ,3 ]
机构
[1] Barcelona Supercomp Ctr, Barcelona, Spain
[2] Univ Politecn Cataluna, Dept Arquitectura Comp, Barcelona, Spain
[3] Bull Atos Technol, Les Clayes Sous Bois, France
基金
欧盟地平线“2020”;
关键词
Cache Coherence; NUMA; Task-based programming models; ARCHITECTURE;
D O I
10.1145/2967938.2967962
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on-and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 1.23x to 2.54x and coherence traffic reductions between 44% and 77% in comparison to NUMA-oblivious scheduling and data allocation. Furthermore, we show that the NUMA-aware techniques we employ at the runtime level are crucial to ensure the added hierarchical layer in the directory coherence protocol does not introduce significant coherence traffic to the system.
引用
收藏
页码:275 / 286
页数:12
相关论文
共 50 条
  • [41] An adaptive limited pointers directory scheme for cache coherence of scalable multiprocessors
    Park, CH
    Choi, JH
    Park, KH
    Park, D
    EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 753 - 756
  • [42] SelectDirectory: A Selective Directory for Cache Coherence in Many-Core Architectures
    Yao, Yuan
    Wang, Guanhua
    Ge, Zhiguo
    Mitra, Tulika
    Chen, Wenzhi
    Zhang, Naxin
    2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2015, : 175 - 180
  • [43] NUMA-Aware Thread Scheduling for Big Data Transfers over Terabits Network Infrastructure
    Kim, Taeuk
    Khan, Awais
    Kim, Youngjae
    Kasu, Preethika
    Atchley, Scott
    SCIENTIFIC PROGRAMMING, 2018, 2018
  • [44] Reducing cache misses for CC-NUMA by careful page-mapping
    Huang, J
    Li, ZY
    INTERNATIONAL SOCIETY FOR COMPUTERS AND THEIR APPLICATIONS 10TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 1997, : 417 - 421
  • [45] Reducing cache traffic and energy with macro data load
    Jin, Lei
    Cho, Sangyeun
    ISLPED '06: PROCEEDINGS OF THE 2006 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2006, : 147 - 150
  • [46] Energy aware cache coherence protocol for chip-multiprocessors
    Ahmed, Rana Ejaz
    2006 Canadian Conference on Electrical and Computer Engineering, Vols 1-5, 2006, : 1366 - 1369
  • [47] Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures
    Manivannan, Madhavan
    Stenstrom, Per
    2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [48] A second-level cache with the distance-aware replacement policy for NUMA systems
    Chung, SW
    Shin, JH
    Kim, HS
    Jhon, CS
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2002, 18 (05) : 803 - 813
  • [49] NUMA-aware Scheduling and Memory Allocation for data-flow task-parallel Applications
    Drebes, Andi
    Pop, Antoniu
    Heydemann, Karine
    Drach, Nathalie
    Cohen, Albert
    ACM SIGPLAN NOTICES, 2016, 51 (08) : 391 - 392
  • [50] Cache What You Need to Cache: Reducing Write Traffic in Cloud Cache via "One-Time-Access-Exclusion" Policy
    Wang, Hua
    Zhang, Jiawei
    Huang, Ping
    Yi, Xinbo
    Cheng, Bin
    Zhou, Ke
    ACM TRANSACTIONS ON STORAGE, 2020, 16 (03)