Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling

被引:7
|
作者
Caheny, Paul [1 ,3 ]
Casas, Marc [1 ,3 ]
Moreto, Miguel [1 ,3 ]
Gloaguen, Herve [2 ]
Saintes, Maxime [2 ]
Ayguade, Eduard [1 ,3 ]
Labarta, Jesus [1 ,3 ]
Valero, Mateo [1 ,3 ]
机构
[1] Barcelona Supercomp Ctr, Barcelona, Spain
[2] Univ Politecn Cataluna, Dept Arquitectura Comp, Barcelona, Spain
[3] Bull Atos Technol, Les Clayes Sous Bois, France
基金
欧盟地平线“2020”;
关键词
Cache Coherence; NUMA; Task-based programming models; ARCHITECTURE;
D O I
10.1145/2967938.2967962
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on-and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 1.23x to 2.54x and coherence traffic reductions between 44% and 77% in comparison to NUMA-oblivious scheduling and data allocation. Furthermore, we show that the NUMA-aware techniques we employ at the runtime level are crucial to ensure the added hierarchical layer in the directory coherence protocol does not introduce significant coherence traffic to the system.
引用
收藏
页码:275 / 286
页数:12
相关论文
共 50 条
  • [31] An adaptive cache coherence protocol: Trading storage for traffic
    Menezo, Lucia G.
    Puente, Valentin
    Gregorio, Jose-Angel
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2017, 102 : 163 - 174
  • [32] A cache-aware scheduling algorithm for embedded systems
    Luculli, G
    Di Natale, M
    18TH IEEE REAL-TIME SYSTEMS SYMPOSIUM, PROCEEDINGS, 1997, : 199 - 209
  • [33] Runtime-Assisted Cache Coherence Deactivation in Task Parallel Programs
    Caheny, Paul
    Alvarez, Lluc
    Valero, Mateo
    Moreto, Miquel
    Casas, Marc
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE, AND ANALYSIS (SC'18), 2018,
  • [34] Cache-Aware Dynamic Classification and Scheduling for Linux
    Gollapudi, Ravi Theja
    Yuksek, Gokturk
    Ghose, Kanad
    2019 IEEE SYMPOSIUM IN LOW-POWER AND HIGH-SPEED CHIPS (COOL CHIPS 22), 2019,
  • [35] CRUISE: Cache Replacement and Utility-aware Scheduling
    Jaleel, Aamer
    Najaf-abadi, Hashem H.
    Subramaniam, Samantika
    Steely, Simon C., Jr.
    Emer, Joel
    ASPLOS XVII: SEVENTEENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2012, : 249 - 259
  • [36] Register aware scheduling for distributed cache clustered architecture
    Wang, Z
    Hu, XS
    Sha, EHM
    ASP-DAC 2003: PROCEEDINGS OF THE ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, 2003, : 71 - 76
  • [37] Impact of cache coherence protocols on the processing of network traffic
    Kumar, Amit
    Huggahalli, Ram
    MICRO-40: PROCEEDINGS OF THE 40TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2007, : 161 - +
  • [38] CRUISE: Cache Replacement and Utility-aware Scheduling
    Jaleel, Aamer
    Najaf-Abadi, Hashem H.
    Subramaniam, Samantika
    Steely, Simon C., Jr.
    Emer, Joel
    ACM SIGPLAN NOTICES, 2012, 47 (04) : 249 - 259
  • [39] Cache Utilization-Aware Scheduling for Multicore Processors
    Chu, Edward T. -H.
    Lu, Wen-wei
    2012 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS (APCCAS), 2012, : 368 - 371
  • [40] DIRECTORY-BASED CACHE COHERENCE IN LARGE-SCALE MULTIPROCESSORS
    CHAIKEN, D
    FIELDS, C
    KURIHARA, K
    AGARWAL, A
    COMPUTER, 1990, 23 (06) : 49 - 58