ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors

被引:15
|
作者
Hou, Kaixi [1 ]
Wang, Hao [1 ]
Feng, Wu-chun [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA
基金
美国国家科学基金会;
关键词
sort; merge; transpose; vectorization; SIMD; ISA; MIC; AVX; AVX-512;
D O I
10.1145/2751205.2751247
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations tedious and error-prone. Hence, we propose a framework for the Automatic SIMDization of Parallel Sorting (ASPaS) on x86-based multicore and manycore processors. That is, ASPaS takes any sorting network and a given instruction set architecture (ISA) as inputs and automatically generates vectorized code for that sorting network. By formalizing the sort function as a sequence of comparators and the transpose and merge functions as sequences of vector-matrix multiplications, ASPaS can map these functions to operations from a selected "pattern pool" that is based on the characteristics of parallel sorting, and then generate the vectorized code with the real ISA intrinsics. The performance evaluation of our ASPaS framework on the Intel Xeon Phi coprocessor illustrates that automatically generated sorting codes from ASPaS can outperform the sorting implementations from STL, Boost, and Intel TBB.
引用
收藏
页码:383 / 392
页数:10
相关论文
共 50 条
  • [31] Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors
    Li, Mingzhen
    Liu, Yi
    Yang, Hailong
    Hu, Yongmin
    Sun, Qingxiao
    Chen, Bangduo
    You, Xin
    Liu, Xiaoyan
    Luan, Zhongzhi
    Qian, Depei
    50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
  • [32] Parallel Evolutionary Algorithms for Stock Market Trading Rule Selection on Many-Core Graphics Processors
    Lipinski, Piotr
    NATURAL COMPUTING IN COMPUTATIONAL FINANCE, VOL 4, 2011, 380 : 79 - 92
  • [33] Design and Optimization of Parallel Algorithm for Kalman Filter on SW26010 Many-Core Processors
    Yang, Aiqiang
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (04)
  • [34] The Research On The Software Architecture Of Network Packet Processing Based On The Many-core Processors
    Wu Kehe
    Cheng Rui
    Zhang Yingqiang
    Mu Hongtao
    PROCEEDINGS OF 2016 IEEE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2016), 2016, : 555 - 559
  • [35] MPI hardware framework for many-core based embedded systems
    Mendonca Pereira, Rodrigo Vinicius
    Seman, Laio Oriel
    Berejuck, Marcelo Daniel
    de Melo, Douglas Rossi
    Morales, Analucia Schiaffino
    Bezerra, Eduardo Augusto
    INTERNATIONAL JOURNAL OF SENSOR NETWORKS, 2021, 35 (01) : 42 - 56
  • [36] A novel sorting algorithm for many-core architectures based on adaptive bitonic sort
    Peters, Hagen
    Schulz-Hildebrandt, Ole
    Luttenberger, Norbert
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 227 - 237
  • [37] DAG Scheduling Considering Parallel Execution for High-Load Processing on Clustered Many-core Processors
    Okamura, Ryo
    Azumi, Takuya
    2022 IEEE/ACM 26TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2022,
  • [38] Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors
    Jiang, Haiyang
    Zhang, Guangxing
    Xie, Gaogang
    Salamatian, Kave
    Mathy, Laurent
    2013 ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS), 2013, : 137 - 146
  • [39] Architecture supported synchronization-based cache coherence protocol for many-core processors
    Huang, He
    Liu, Lei
    Song, Feng-Long
    Ma, Xiao-Yu
    Jisuanji Xuebao/Chinese Journal of Computers, 2009, 32 (08): : 1618 - 1630
  • [40] A Parallel Many-core CUDA-based Graph Labeling Computation
    Quer, Stefano
    ICSOFT: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES, 2020, : 597 - 605