NUDA: Non-Uniform Directory Architecture for Scalable Chip Multiprocessors

被引:5
|
作者
Shu, Wei [1 ]
Tzeng, Nian-Feng [1 ]
机构
[1] Univ Louisiana Lafayette, CACS, Lafayette, LA 70503 USA
基金
美国国家科学基金会;
关键词
Chip multi-processors; coherence protocols; hash tables; memory hierarchy; prefetching; sharer tracking directory;
D O I
10.1109/TC.2017.2773061
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Chip multiprocessors (CMPs) involve directory storage overhead if cache coherence is realized via sharer tracking. This work proposes a novel framework dubbed non-uniform directory architecture (NUDA), by leveraging our two insights in that the number of "active" directory entries required to stay on chip is usually small for a short execution time window due to high directory locality, and that the fraction of interrogated directory entries drops as the core count rises. Unlike earlier storage overhead reduction techniques that require all cached LLC blocks to have their directory entries fully on chip, NUDA dynamically buffers only most active directory vectors (DVs) on chip while keeping DVs of all LLC blocks in a backing store at low level storage. NUDA attains its superior efficiency via an inventive criticality-aware replacement policy (CARP) for on-chip buffer management and effective prefetching to pre-activate vectors (PAVE) for upcoming coherence interrogations. We have evaluated NUDA by gem5 simulation for 64-core CMPs under PARSEC and SPLASH benchmarks, demonstrating that CARP and PAVE enhance on-chip directory storage efficiency significantly. NUDA with a small on-chip buffer for DVs exhibits negligible performance degradation (to stay within 2.6 percent) compared to a full on-chip directory, while outperforming its previous counterparts for directory area reduction when on-chip directory budget is provisioned scarcely for high scalability.
引用
收藏
页码:740 / 747
页数:8
相关论文
共 50 条
  • [1] Non-Uniform Fat-Meshes for Chip Multiprocessors
    Zhang, Yu
    Jones, Alex K.
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 2057 - 2064
  • [2] NON-UNIFORM "FAT-MESHES" FOR CHIP MULTIPROCESSORS
    Zhang, Yu
    Jones, Alex K.
    [J]. PARALLEL PROCESSING LETTERS, 2009, 19 (04) : 595 - 617
  • [3] An Energy Efficient Non-uniform Last Level Cache Architecture in 3D Chip-Multiprocessors
    Safayenikoo, Pooneh
    Asad, Arghavan
    Fathy, Mahmood
    Mohammadi, Farah
    [J]. PROCEEDINGS OF THE EIGHTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN (ISQED), 2017, : 373 - 378
  • [4] A new scalable directory architecture for large-scale multiprocessors
    Acacio, ME
    González, J
    García, JM
    Duato, J
    [J]. HPCA: SEVENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTING ARCHITECTURE, PROCEEDINGS, 2001, : 97 - 106
  • [5] NIZCache: Energy-efficient Non-uniform Cache Architecture for Chip-multiprocessors Based on Invalid and Zero Lines
    Safayenikoo, Pooneh
    Asad, Arghavan
    Fathy, Mahmood
    Mohammadi, Farah
    [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [6] A novel lightweight directory architecture for scalable shared-memory multiprocessors
    Ros, A
    Acacio, ME
    García, JM
    [J]. EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS, 2005, 3648 : 582 - 591
  • [7] NUDA: A Non-Uniform Debugging Architecture and Non-Intrusive Race Detection for Many-Core
    Wen, Chi-Neng
    Chou, Shu-Hsuan
    Chen, Tien-Fu
    Su, Alan Peisheng
    [J]. DAC: 2009 46TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, VOLS 1 AND 2, 2009, : 148 - +
  • [8] NUDA: A Non-Uniform Debugging Architecture and Nonintrusive Race Detection for Many-Core Systems
    Wen, Chi-Neng
    Chou, Shu-Hsuan
    Chen, Chien-Chih
    Chen, Tien-Fu
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (02) : 199 - 212
  • [9] Broadcast directory: A scalable cache coherent architecture for mesh-connected multiprocessors
    Rhee, Y
    Lee, J
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2000, 46 (10) : 903 - 918
  • [10] LRU-PEA: A Smart Replacement Policy for Non-Uniform Cache Architectures on Chip Multiprocessors
    Lira, Javier
    Molina, Carlos
    Gonzalez, Antonio
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2009, : 275 - +