Task Parallel Framework and Its Application in Nested Parallel Algorithms on the SW26010 Many-core Platform

被引:0
|
作者
Sun Q. [1 ]
Li L.-S. [1 ]
Zhao H.-T. [1 ]
Zhao H. [1 ]
Wu C.-M. [1 ]
机构
[1] Laboratory of Parallel Software and Computational Science, Institute of Software, Chinese Academy of Sciences, Beijing
来源
Wu, Chang-Mao (changmaowu@foxmail.com) | 1600年 / Chinese Academy of Sciences卷 / 32期
关键词
Nested parallel algorithm; Parallel computing; SW26010 many-core CPU; SWAN; Task parallel framework;
D O I
10.13328/j.cnki.jos.006007
中图分类号
学科分类号
摘要
Task parallelism is one of the fundamental patterns for designing parallel algorithms. Due to algorithm complexity and distinctive hardware features, however, implementation of algorithms in task parallelism often remains to be challenging. On the newly SW26010 many-core CPU platform, a general runtime framework, SWAN, which supports nested task parallelism is proposed in this study. SWAN provides high-level abstractions for programmers to implement task parallelism so that they can focus mainly on the algorithm itself, enjoying an enhanced productivity. In the aspect of performance, the shared resources and information manipulated by SWAN are partitioned in a fine-grained manner to avoid fierce contention among working threads. The core data structures within SWAN take advantage of the high-bandwidth memory access mechanism, fast on-chip scratchpad cache as well as atomic operations of the platform to reduce the overhead of SWAN itself. Besides, SWAN provides dynamic load-balancing strategies in runtime to ensure a full occupation of the threads. In the experiment, a set of recursive algorithms in nested parallelism, including the N-queens problem, binary-tree traversal, quick sort, and convex hull, are implemented using SWAN on the target platform. The experimental results reveal that each of the algorithms can gain a significant speedup, from 4.5x to 32x, against its serial counterpart, which suggests that SWAN has a high usability and performance. © Copyright 2021, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2352 / 2364
页数:12
相关论文
共 29 条
  • [1] Wang L, Cui HM, Chen L, Feng XB., Research on task parallel programming model, Ruan Jian Xue Bao/Journal of Software, 24, 1, pp. 77-90, (2013)
  • [2] An H, Chen GL., Parallel programming models and languages, Ruan Jian Xue Bao/Journal of Software, 13, 1, pp. 118-124, (2002)
  • [3] Ian F, Krishnaiyer R, Choudhary A., A library-based approach to task parallelism in a data-parallel language, Journal of Parallel & Distributed Computing, 45, 2, pp. 148-158, (1997)
  • [4] Blikberg R, Srevik T., Nested parallelism: Allocation of threads to tasks and OpenMP, Scientific Programming, 9, 2, pp. 185-194, (2001)
  • [5] Duran A, Teruel X, Ferrer R, Martorell X, Ayguad E., Barcelona OpenMP tasks suite: A set of benchmarks targeting the exploitation of task parallelism in OpenMP, Proc. of the 2009 Int'l Conf. on Parallel Processing, pp. 124-131, (2009)
  • [6] Wang QX, Sun SX, Shang MS, Liu YB., Research of parallel computing model, Computer Science, 31, 9, pp. 130-133, (2004)
  • [7] Blumofe RD, Joerg CF, Kuszmaul BC, Leiserson CE, Randall KH, Zhou Y., Cilk: An efficient multithreaded runtime system, Journal of Parallel & Distributed Computing, 37, 1, pp. 55-69, (1996)
  • [8] Sun Q, Zhang CY, Wu CM, Zhang JJ, Li LS., Bandwidth reduced parallel SpMV on the SW26010 many-core platform, Proc. of the 47th Int'l Conf. on Parallel Processing, (2018)
  • [9] Ayguad E, Copty N, Duran A, Hoeflinger J, Lin Y, Massaioli F, Su E, Unnikrishnan P, Zhang G., A proposal for task parallelism in OpenMP, Proc. of the Int'l Workshop on OpenMP, pp. 1-12, (2010)
  • [10] Mller MS, Baron J, Brantley WC, Feng H, Hackenberg D, Henschel R, Jost G, Molka D, Parrott C, Robichaux J., SPEC OMP2012- An application benchmark suite for parallel systems using OpenMP, Proc. of the Int'l Conf. on OpenMP in a Heterogeneous World, pp. 223-236, (2012)