Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling

被引：7

作者：

Caheny, Paul ^{[1
,3
]}

Casas, Marc ^{[1
,3
]}

Moreto, Miguel ^{[1
,3
]}

Gloaguen, Herve ^{[2
]}

Saintes, Maxime ^{[2
]}

Ayguade, Eduard ^{[1
,3
]}

Labarta, Jesus ^{[1
,3
]}

Valero, Mateo ^{[1
,3
]}

机构：

[1] Barcelona Supercomp Ctr, Barcelona, Spain

[2] Univ Politecn Cataluna, Dept Arquitectura Comp, Barcelona, Spain

[3] Bull Atos Technol, Les Clayes Sous Bois, France

来源：

2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT) | 2016年

基金：

欧盟地平线“2020”;

关键词：

Cache Coherence; NUMA; Task-based programming models; ARCHITECTURE;

D O I：

10.1145/2967938.2967962

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cache Coherent NUMA (ccNUMA) architectures are a widespread paradigm due to the benefits they provide for scaling core count and memory capacity. Also, the flat memory address space they offer considerably improves programmability. However, ccNUMA architectures require sophisticated and expensive cache coherence protocols to enforce correctness during parallel executions, which trigger a significant amount of on-and off-chip traffic in the system. This paper analyses how coherence traffic may be best constrained in a large, real ccNUMA platform through the use of a joint hardware/software approach. For several benchmarks, we study coherence traffic in detail under the influence of an added hierarchical cache layer in the directory protocol combined with runtime managed NUMA-aware scheduling and data allocation techniques to make most efficient use of the added hardware. The effectiveness of this joint approach is demonstrated by speedups of 1.23x to 2.54x and coherence traffic reductions between 44% and 77% in comparison to NUMA-oblivious scheduling and data allocation. Furthermore, we show that the NUMA-aware techniques we employ at the runtime level are crucial to ensure the added hierarchical layer in the directory coherence protocol does not introduce significant coherence traffic to the system.

引用

页码：275 / 286

页数：12

共 50 条

[41] An adaptive limited pointers directory scheme for cache coherence of scalable multiprocessors
Park, CH
Choi, JH
Park, KH
Park, D
EURO-PAR'99: PARALLEL PROCESSING, 1999, 1685 : 753 - 756
[42] SelectDirectory: A Selective Directory for Cache Coherence in Many-Core Architectures
Yao, Yuan
Wang, Guanhua
Ge, Zhiguo
Mitra, Tulika
Chen, Wenzhi
Zhang, Naxin
2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2015, : 175 - 180
[43] NUMA-Aware Thread Scheduling for Big Data Transfers over Terabits Network Infrastructure
Kim, Taeuk
Khan, Awais
Kim, Youngjae
Kasu, Preethika
Atchley, Scott
SCIENTIFIC PROGRAMMING, 2018, 2018
[44] Reducing cache misses for CC-NUMA by careful page-mapping
Huang, J
Li, ZY
INTERNATIONAL SOCIETY FOR COMPUTERS AND THEIR APPLICATIONS 10TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING SYSTEMS, 1997, : 417 - 421
[45] Reducing cache traffic and energy with macro data load
Jin, Lei
Cho, Sangyeun
ISLPED '06: PROCEEDINGS OF THE 2006 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2006, : 147 - 150
[46] Energy aware cache coherence protocol for chip-multiprocessors
Ahmed, Rana Ejaz
2006 Canadian Conference on Electrical and Computer Engineering, Vols 1-5, 2006, : 1366 - 1369
[47] Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures
Manivannan, Madhavan
Stenstrom, Per
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[48] A second-level cache with the distance-aware replacement policy for NUMA systems
Chung, SW
Shin, JH
Kim, HS
Jhon, CS
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2002, 18 (05) : 803 - 813
[49] NUMA-aware Scheduling and Memory Allocation for data-flow task-parallel Applications
Drebes, Andi
Pop, Antoniu
Heydemann, Karine
Drach, Nathalie
Cohen, Albert
ACM SIGPLAN NOTICES, 2016, 51 (08) : 391 - 392
[50] Cache What You Need to Cache: Reducing Write Traffic in Cloud Cache via "One-Time-Access-Exclusion" Policy
Wang, Hua
Zhang, Jiawei
Huang, Ping
Yi, Xinbo
Cheng, Bin
Zhou, Ke
ACM TRANSACTIONS ON STORAGE, 2020, 16 (03)

← 1 2 3 4 5 →