ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors

被引:15
|
作者
Hou, Kaixi [1 ]
Wang, Hao [1 ]
Feng, Wu-chun [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA
基金
美国国家科学基金会;
关键词
sort; merge; transpose; vectorization; SIMD; ISA; MIC; AVX; AVX-512;
D O I
10.1145/2751205.2751247
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations tedious and error-prone. Hence, we propose a framework for the Automatic SIMDization of Parallel Sorting (ASPaS) on x86-based multicore and manycore processors. That is, ASPaS takes any sorting network and a given instruction set architecture (ISA) as inputs and automatically generates vectorized code for that sorting network. By formalizing the sort function as a sequence of comparators and the transpose and merge functions as sequences of vector-matrix multiplications, ASPaS can map these functions to operations from a selected "pattern pool" that is based on the characteristics of parallel sorting, and then generate the vectorized code with the real ISA intrinsics. The performance evaluation of our ASPaS framework on the Intel Xeon Phi coprocessor illustrates that automatically generated sorting codes from ASPaS can outperform the sorting implementations from STL, Boost, and Intel TBB.
引用
收藏
页码:383 / 392
页数:10
相关论文
共 50 条
  • [1] A Framework for the Automatic Vectorization of Parallel Sort on x86-Based Processors
    Hou, Kaixi
    Wang, Hao
    Feng, Wu-Chun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (05) : 958 - 972
  • [2] AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors
    Hou, Kaixi
    Wang, Hao
    Feng, Wu-chun
    2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 780 - 789
  • [3] Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors
    Yan, Chenggang
    Zhang, Yongdong
    Xu, Jizheng
    Dai, Feng
    Zhang, Jun
    Dai, Qionghai
    Wu, Feng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (12) : 2077 - 2089
  • [4] Parallelizing Compilation Framework for Heterogeneous Many-core Processors
    Li Y.-B.
    Zhao R.-C.
    Han L.
    Zhao J.
    Xu J.-L.
    Li Y.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 981 - 1001
  • [5] A Homogeneous Many-core x86 Processor Full System Framework Based on NoC
    Zhang, Qinhong
    Zhou, Meng
    Chen, Juan
    Yang, Hao
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 794 - 797
  • [6] A Scalable Parallel Architecture Based on Many-Core Processors for Generating HTTP Traffic
    Wang, Xinheng
    Xu, Chuan
    Jin, Wenqiang
    Wang, Jiajie
    Wang, Qianyun
    Zhao, Guofeng
    APPLIED SCIENCES-BASEL, 2017, 7 (02):
  • [7] Parallel space saving on multi- and many-core processors
    Cafaro, Massimo
    Pulimeno, Marco
    Epicoco, Italo
    Aloisio, Giovanni
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07):
  • [8] Reducing the burden of parallel loop schedulers for many-core processors
    Arif, Mahwish
    Vandierendonck, Hans
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13):
  • [9] PARALLEL SIMULATION OF MANY-CORE PROCESSORS: INTEGRATION OF RESEARCH AND EDUCATION
    Moreshet, Tali
    Vishkin, Uzi
    Keceli, Fuat
    2012 ASEE ANNUAL CONFERENCE, 2012,
  • [10] A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors
    Yan, Chenggang
    Zhang, Yongdong
    Xu, Jizheng
    Dai, Feng
    Li, Liang
    Dai, Qionghai
    Wu, Feng
    IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (05) : 573 - 576