Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors

被引：0

作者：

Kumar, Sanjeev ^{[1
]}

Hughes, Christopher J. ^{[1
]}

Nguyen, Anthony ^{[1
]}

机构：

[1] Intel, Microprocessor Technol Labs, Santa Clara, CA 95052 USA

来源：

ISCA'07: 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, CONFERENCE PROCEEDINGS | 2007年

关键词：

CMP; loop and task parallelism; architectural support;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Chip multiprocessors (CMPs) are now commonplace, and the number of cores on a CMP is likely to grow steadily. However, in order to harness the additional compute resources of a CMP, applications must expose their thread-level parallelism to the hardware. One common approach to doing this is to decompose a program into parallel "tasks" and allow an underlying software layer to schedule these tasks to different threads. Software task scheduling can provide good parallel performance as long as tasks are large compared to the software overheads. We examine a set of applications from an important emerging domain: Recognition, Mining, and Synthesis (RMS). Many RMS applications are compute-intensive and have abundant thread-level parallelism, and are therefore good targets for running on a CMP. However, a significant number have small tasks for which software task schedulers achieve only limited parallel speedups. We propose Carbon, a hardware technique to accelerate dynamic task scheduling on scalable CMPs. Carbon has relatively simple hardware, most of which can be placed far from the cores. We compare Carbon to some highly tuned software task schedulers for a set of RMS benchmarks with small tasks. Carbon delivers significant performance improvements over the best software scheduler: on average for 64 cores, 68% faster on a set of loop-parallel benchmarks, and 109% faster on a set of task-parallel benchmarks.

引用

页码：162 / 173

页数：12

共 50 条

[41] Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMP
Munera, Adrian
Royuela, Sara
Pressler, Michael
Mackamul, Harald
Ziegenbein, Dirk
Quinones, Eduardo
JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 146
[42] Investigating the limits of fine-grained parallelism in a statically scheduled superscalar architecture
Lecture Notes in Computer Science, 1996, 1124
[43] Router Support for Fine-Grained Latency Measurements
Kompella, Ramana Rao
Levchenko, Kirill
Snoeren, Alex C.
Varghese, George
IEEE-ACM TRANSACTIONS ON NETWORKING, 2012, 20 (03) : 811 - 824
[44] A model to support fine-grained delegation of authorization
Lui, RWC
Hui, LCK
Yiu, SM
Woo, Y
SAM '05: PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON SECURITY AND MANAGEMENT, 2005, : 208 - 212
[45] Exploiting Fine-Grained Pipeline Parallelism for Wavefront Computations on Multicore Platforms
Wu, Guiming
Wang, Miao
Dou, Yong
Xia, Fei
2009 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS (ICPPW 2009), 2009, : 402 - 408
[46] Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals
Singhal, Vidush
Sakka, Laith
Sundararajah, Kirshanthan
Newton, Ryan R.
Kulkarni, Milind
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (02)
[47] EXPRESSING FINE-GRAINED PARALLELISM USING CONCURRENT DATA-STRUCTURES
JAGANNATHAN, S
LECTURE NOTES IN COMPUTER SCIENCE, 1992, 574 : 77 - 92
[48] Reducing Query Latencies in Web Search Using Fine-Grained Parallelism
Frachtenberg, Eitan
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2009, 12 (04): : 441 - 460
[49] SegScope: Probing Fine-grained Interrupts via Architectural Footprints
Zhang, Xin
Zhang, Zhi
Shen, Qingni
Wang, Wenhao
Gao, Yansong
Yang, Zhuoxi
Zhang, Jiliang
2024 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA 2024, 2024, : 424 - 438
[50] Integrating fine-grained message passing in cache coherent shared memory multiprocessors
Poulsen, DK
Yew, PC
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 33 (02) : 172 - 188

← 1 2 3 4 5 →