NUDA: Non-Uniform Directory Architecture for Scalable Chip Multiprocessors

被引：5

作者：

Shu, Wei ^{[1
]}

Tzeng, Nian-Feng ^{[1
]}

机构：

[1] Univ Louisiana Lafayette, CACS, Lafayette, LA 70503 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2018年 / 67卷 / 05期

基金：

美国国家科学基金会;

关键词：

Chip multi-processors; coherence protocols; hash tables; memory hierarchy; prefetching; sharer tracking directory;

D O I：

10.1109/TC.2017.2773061

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Chip multiprocessors (CMPs) involve directory storage overhead if cache coherence is realized via sharer tracking. This work proposes a novel framework dubbed non-uniform directory architecture (NUDA), by leveraging our two insights in that the number of "active" directory entries required to stay on chip is usually small for a short execution time window due to high directory locality, and that the fraction of interrogated directory entries drops as the core count rises. Unlike earlier storage overhead reduction techniques that require all cached LLC blocks to have their directory entries fully on chip, NUDA dynamically buffers only most active directory vectors (DVs) on chip while keeping DVs of all LLC blocks in a backing store at low level storage. NUDA attains its superior efficiency via an inventive criticality-aware replacement policy (CARP) for on-chip buffer management and effective prefetching to pre-activate vectors (PAVE) for upcoming coherence interrogations. We have evaluated NUDA by gem5 simulation for 64-core CMPs under PARSEC and SPLASH benchmarks, demonstrating that CARP and PAVE enhance on-chip directory storage efficiency significantly. NUDA with a small on-chip buffer for DVs exhibits negligible performance degradation (to stay within 2.6 percent) compared to a full on-chip directory, while outperforming its previous counterparts for directory area reduction when on-chip directory budget is provisioned scarcely for high scalability.

引用

页码：740 / 747

页数：8

共 50 条

[1] Non-Uniform Fat-Meshes for Chip Multiprocessors
Zhang, Yu
Jones, Alex K.
[J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 2057 - 2064
[2] NON-UNIFORM "FAT-MESHES" FOR CHIP MULTIPROCESSORS
Zhang, Yu
Jones, Alex K.
[J]. PARALLEL PROCESSING LETTERS, 2009, 19 (04) : 595 - 617
[3] An Energy Efficient Non-uniform Last Level Cache Architecture in 3D Chip-Multiprocessors
Safayenikoo, Pooneh
Asad, Arghavan
Fathy, Mahmood
Mohammadi, Farah
[J]. PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED), 2017, : 373 - 378
[4] A new scalable directory architecture for large-scale multiprocessors
Acacio, ME
González, J
García, JM
Duato, J
[J]. HPCA: SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTING ARCHITECTURE, PROCEEDINGS, 2001, : 97 - 106
[5] NIZCache: Energy-efficient Non-uniform Cache Architecture for Chip-multiprocessors Based on Invalid and Zero Lines
Safayenikoo, Pooneh
Asad, Arghavan
Fathy, Mahmood
Mohammadi, Farah
[J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[6] A novel lightweight directory architecture for scalable shared-memory multiprocessors
Ros, A
Acacio, ME
García, JM
[J]. EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS, 2005, 3648 : 582 - 591
[7] NUDA: A Non-Uniform Debugging Architecture and Non-Intrusive Race Detection for Many-Core
Wen, Chi-Neng
Chou, Shu-Hsuan
Chen, Tien-Fu
Su, Alan Peisheng
[J]. DAC: 2009 46TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2009, : 148 - +
[8] NUDA: A Non-Uniform Debugging Architecture and Nonintrusive Race Detection for Many-Core Systems
Wen, Chi-Neng
Chou, Shu-Hsuan
Chen, Chien-Chih
Chen, Tien-Fu
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (02) : 199 - 212
[9] Broadcast directory: A scalable cache coherent architecture for mesh-connected multiprocessors
Rhee, Y
Lee, J
[J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2000, 46 (10) : 903 - 918
[10] LRU-PEA: A Smart Replacement Policy for Non-Uniform Cache Architectures on Chip Multiprocessors
Lira, Javier
Molina, Carlos
Gonzalez, Antonio
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2009, : 275 - +

← 1 2 3 4 5 →