Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors

被引:0
|
作者
Kumar, Sanjeev [1 ]
Hughes, Christopher J. [1 ]
Nguyen, Anthony [1 ]
机构
[1] Intel, Microprocessor Technol Labs, Santa Clara, CA 95052 USA
关键词
CMP; loop and task parallelism; architectural support;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Chip multiprocessors (CMPs) are now commonplace, and the number of cores on a CMP is likely to grow steadily. However, in order to harness the additional compute resources of a CMP, applications must expose their thread-level parallelism to the hardware. One common approach to doing this is to decompose a program into parallel "tasks" and allow an underlying software layer to schedule these tasks to different threads. Software task scheduling can provide good parallel performance as long as tasks are large compared to the software overheads. We examine a set of applications from an important emerging domain: Recognition, Mining, and Synthesis (RMS). Many RMS applications are compute-intensive and have abundant thread-level parallelism, and are therefore good targets for running on a CMP. However, a significant number have small tasks for which software task schedulers achieve only limited parallel speedups. We propose Carbon, a hardware technique to accelerate dynamic task scheduling on scalable CMPs. Carbon has relatively simple hardware, most of which can be placed far from the cores. We compare Carbon to some highly tuned software task schedulers for a set of RMS benchmarks with small tasks. Carbon delivers significant performance improvements over the best software scheduler: on average for 64 cores, 68% faster on a set of loop-parallel benchmarks, and 109% faster on a set of task-parallel benchmarks.
引用
收藏
页码:162 / 173
页数:12
相关论文
共 50 条
  • [31] Testing fine-grained parallelism for the ADMM on a factor-graph
    Hao, Ning
    Oghbaee, AmirReza
    Rostami, Mohammad
    Derbinsky, Nate
    Bento, Jose
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 835 - 844
  • [32] Architectural support for exploitation of fine-grain parallelism
    不详
    EXPLOITATION OF FINE-GRAIN PARALLELISM, 1995, 942 : 32 - 37
  • [33] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [34] BALANCING FINE-GRAINED AND MEDIUM-GRAINED PARALLELISM IN SCHEDULING LOOPS FOR THE XIMD ARCHITECTURE
    NEWBURN, CJ
    HUANG, AS
    SHEN, JP
    IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1993, 23 : 39 - 52
  • [35] Fine-Grained DVFS Using On-Chip Regulators
    Eyerman, Stijn
    Eeckhout, Lieven
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2011, 8 (01)
  • [36] Towards Fine-grained Parallelism in Parallel and Distributed Python']Python Libraries
    Kerney, Jamison
    Raicu, Joan
    Raicu, John
    Chard, Kyle
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 706 - 715
  • [37] Reducing Query Latencies in Web Search Using Fine-Grained Parallelism
    Eitan Frachtenberg
    World Wide Web, 2009, 12 : 441 - 460
  • [38] Tool support for fine-grained software inspection
    Anderson, P
    Reps, T
    Teitelbaum, T
    Zarins, M
    IEEE SOFTWARE, 2003, 20 (04) : 42 - +
  • [39] Automatic Parallelization of Fine-Grained Metafunctions on a Chip Multiprocessor
    Lee, Sanghoon
    Tuck, James
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (04)
  • [40] A Fine-grained Asynchronous Bulk Synchronous parallelism model for PGAS applications
    Paul, Sri Raj
    Hayashi, Akihiro
    Chen, Kun
    Elmougy, Youssef
    Sarkar, Vivek
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 69