An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration

被引：14

作者：

Acacio, ME

González, J

García, JM

Duato, J

机构：

[1] Univ Murcia, Dept Ingn & Tecnol Comp, Fac Informat, E-30071 Murcia, Spain

[2] Intel Labs Barcelona, Intel Barcelona Res Ctr, Barcelona 08034, Spain

[3] Univ Politecn Valencia, Dept Informat Sistemas & Comp, Valencia 46010, Spain

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2004年 / 15卷 / 08期

关键词：

cc-NUMA multiprocessor; directory memory overhead; L2 miss latency; three-level directory; shared data cache; on-processor-chip integration;

D O I：

10.1109/TPDS.2004.27

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Recent technology improvements allow multiprocessor designers to put some key components inside the processor chip, such as the memory controller, the coherence hardware, and the network interface/router. In this paper, we exploit such integration scale, presenting a novel node architecture aimed at reducing the long L2 miss latencies and the memory overhead of using directories that characterize cc-NUMA machines and limit their scalability. Our proposal replaces the traditional directory with a novel three-level directory architecture, as well as it adds a small shared data cache to each of the nodes of a multiprocessor system. Due to their small size, the first-level directory and the shared data cache are integrated into the processor chip in every node, which enhances performance by saving accesses to the slower main memory. Scalability is guaranteed by having the second and third-level directories out of the processor chip and using compressed data structures. A taxonomy of the L2 misses, according to the actions performed by the directory to satisfy them, is also presented. Using execution-driven simulations, we show that significant latency reductions can be obtained by using the proposed node architecture, which translates into reductions of more than 30 percent in several cases in the application execution time.

引用

页码：755 / 768

页数：14

共 50 条

[1] Hardware Fault Containment in Scalable Shared-Memory Multiprocessors Architecture
Teodosiu, D.
Baxter, J.
Govil, K.
Chapin, J.
Computer Architecture News, 25 (02):
[2] A novel lightweight directory architecture for scalable shared-memory multiprocessors
Ros, A
Acacio, ME
García, JM
EURO-PAR 2005 PARALLEL PROCESSING, PROCEEDINGS, 2005, 3648 : 582 - 591
[3] Parallelization of benchmarks for scalable shared-memory multiprocessors
Paek, Y
Navarro, A
Zapata, E
Padua, D
1998 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 1998, : 401 - 408
[4] SCALABLE CACHE COHERENCE FOR SHARED-MEMORY MULTIPROCESSORS
THAPAR, M
DELAGI, BA
FLYNN, MJ
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 591 : 1 - 12
[5] ALGORITHMS FOR SCALABLE SYNCHRONIZATION ON SHARED-MEMORY MULTIPROCESSORS
MELLORCRUMMEY, JM
SCOTT, ML
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1991, 9 (01): : 21 - 65
[6] Data forwarding in scalable shared-memory multiprocessors
Koufaty, DA
Chen, XF
Poulsen, DK
Torrellas, J
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (12) : 1250 - 1264
[7] Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration
Acacio, ME
González, J
García, JM
Duato, J
10TH EUROMICRO WORKSHOP ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2002, : 368 - 375
[8] COOPERATIVE SHARED-MEMORY - SOFTWARE AND HARDWARE FOR SCALABLE MULTIPROCESSORS
HILL, MD
LARUS, JR
REINHARDT, SK
WOOD, DA
ACM TRANSACTIONS ON COMPUTER SYSTEMS, 1993, 11 (04): : 300 - 318
[9] Coherence controller architectures for scalable shared-memory multiprocessors
Michael, MM
Nanda, AK
Lim, BH
IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (02) : 245 - 255
[10] A scalable and efficient storage allocator on shared-memory multiprocessors
Vee, V.-Y. (vyvee@singnet.com.sg), 2001, World Scientific Publishing Co. Pte Ltd (11) : 2 - 3

← 1 2 3 4 5 →