Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors

被引:0
|
作者
Kumar, Sanjeev [1 ]
Hughes, Christopher J. [1 ]
Nguyen, Anthony [1 ]
机构
[1] Intel, Microprocessor Technol Labs, Santa Clara, CA 95052 USA
关键词
CMP; loop and task parallelism; architectural support;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Chip multiprocessors (CMPs) are now commonplace, and the number of cores on a CMP is likely to grow steadily. However, in order to harness the additional compute resources of a CMP, applications must expose their thread-level parallelism to the hardware. One common approach to doing this is to decompose a program into parallel "tasks" and allow an underlying software layer to schedule these tasks to different threads. Software task scheduling can provide good parallel performance as long as tasks are large compared to the software overheads. We examine a set of applications from an important emerging domain: Recognition, Mining, and Synthesis (RMS). Many RMS applications are compute-intensive and have abundant thread-level parallelism, and are therefore good targets for running on a CMP. However, a significant number have small tasks for which software task schedulers achieve only limited parallel speedups. We propose Carbon, a hardware technique to accelerate dynamic task scheduling on scalable CMPs. Carbon has relatively simple hardware, most of which can be placed far from the cores. We compare Carbon to some highly tuned software task schedulers for a set of RMS benchmarks with small tasks. Carbon delivers significant performance improvements over the best software scheduler: on average for 64 cores, 68% faster on a set of loop-parallel benchmarks, and 109% faster on a set of task-parallel benchmarks.
引用
收藏
页码:162 / 173
页数:12
相关论文
共 50 条
  • [1] Exploiting fine-grained data parallelism with chip multiprocessors and fast barriers
    Sampson, Jack
    Gonzalez, Ruben
    Collard, Jean-Francois
    Jouppi, Norman P.
    Schlansker, Mike
    Calder, Brad
    MICRO-39: PROCEEDINGS OF THE 39TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, 2006, : 235 - +
  • [2] Support for fine-grained synchronization in shared-memory multiprocessors
    Vlassov, Vladimir
    Merino, Oscar Sierra
    Moritz, Csaba Andras
    Popov, Konstantin
    PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2007, 4671 : 453 - 467
  • [3] FINE-GRAINED PARALLELISM IN ELLIE
    ANDERSEN, B
    JOURNAL OF OBJECT-ORIENTED PROGRAMMING, 1992, 5 (03): : 55 - 61
  • [4] Fine-grained task reweighting on multiprocessors
    Block, A
    Anderson, JH
    Bishop, G
    11TH IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2005, : 429 - 435
  • [5] Fine-grained parallelism in computational mathematics
    Bandman, OL
    PROGRAMMING AND COMPUTER SOFTWARE, 2001, 27 (04) : 170 - 182
  • [6] Fine-Grained Parallelism in Computational Mathematics
    O. L. Bandman
    Programming and Computer Software, 2001, 27 : 170 - 182
  • [7] Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip
    Li, Sheng
    Kuntz, Shannon
    Brockman, Jay B.
    Kogge, Peter M.
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2011, 22 (07) : 1178 - 1191
  • [8] Evaluation of Fine-grained Parallelism in AUTOSAR Applications
    Stegmeier, Alexander
    Kehr, Sebastian
    George, Dave
    Bradatsch, Christian
    Panic, Milos
    Bodekker, Bert
    Ungerer, Theo
    INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS 2017), 2017, : 121 - 128
  • [9] A MATCHING APPROACH TO UTILIZING FINE-GRAINED PARALLELISM
    GUPTA, R
    SOFFA, ML
    PROCEEDINGS OF THE TWENTY-FIRST, ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOLS 1-4: ARCHITECTURE TRACK, SOFTWARE TRACK, DECISION SUPPORT AND KNOWLEDGE BASED SYSTEMS TRACK, APPLICATIONS TRACK, 1988, : 148 - 156
  • [10] Exploiting Fine-Grained Parallelism on Cell Processors
    Hoffmann, Ralf
    Prell, Andreas
    Rauber, Thomas
    EURO-PAR 2010 - PARALLEL PROCESSING, PART II, 2010, 6272 : 175 - 186