ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors

被引:15
|
作者
Hou, Kaixi [1 ]
Wang, Hao [1 ]
Feng, Wu-chun [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA
基金
美国国家科学基金会;
关键词
sort; merge; transpose; vectorization; SIMD; ISA; MIC; AVX; AVX-512;
D O I
10.1145/2751205.2751247
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations tedious and error-prone. Hence, we propose a framework for the Automatic SIMDization of Parallel Sorting (ASPaS) on x86-based multicore and manycore processors. That is, ASPaS takes any sorting network and a given instruction set architecture (ISA) as inputs and automatically generates vectorized code for that sorting network. By formalizing the sort function as a sequence of comparators and the transpose and merge functions as sequences of vector-matrix multiplications, ASPaS can map these functions to operations from a selected "pattern pool" that is based on the characteristics of parallel sorting, and then generate the vectorized code with the real ISA intrinsics. The performance evaluation of our ASPaS framework on the Intel Xeon Phi coprocessor illustrates that automatically generated sorting codes from ASPaS can outperform the sorting implementations from STL, Boost, and Intel TBB.
引用
收藏
页码:383 / 392
页数:10
相关论文
共 50 条
  • [11] Fast parallel stream compaction for IA-based multi/many-core processors
    Sun, Qiao
    Yang, Chao
    Wu, Changmao
    Li, Leisheng
    Liu, Fangfang
    2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 736 - 745
  • [12] Queuing Ports for Mesh Based Many-Core Processors
    Villaescusa D.G.
    Rivas M.A.
    Harbour M.G.
    Ada User Journal, 2021, 42 (3-4): : 189 - 192
  • [13] Highly scalable parallel genetic algorithm on Sunway many-core processors
    Xiao, Zhiyong
    Liu, Xu
    Xu, Jingheng
    Sun, Qingxiao
    Gan, Lin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 679 - 691
  • [14] Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors
    Courtecuisse, Hadrien
    Allard, Jeremie
    HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 139 - 147
  • [15] POSTER: Reducing the Burden of Parallel Loop Schedulers for Many-Core Processors
    Arif, Mahwish
    Vandierendonck, Hans
    ACM SIGPLAN NOTICES, 2018, 53 (01) : 383 - 384
  • [16] Sesame: A User-Transparent Optimizing Framework for Many-Core Processors
    Fang, Jianbin
    Varbanescu, Ana Lucia
    Sips, Henk
    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 70 - 73
  • [17] A many-core based parallel tabu search
    Lam, Yuet M.
    Luk, Wayne
    International Journal of Computers and Applications, 2014, 36 (01) : 15 - 22
  • [18] The Research on The CPU Intelligent Scheduling Based On The Many-core Processors
    Shao Zuozhi
    Zhang Yingqiang
    Mu Hongtao
    Cheng Rui
    PROCEEDINGS OF 2016 IEEE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2016), 2016, : 779 - 782
  • [19] Regional cache organization for NoC based many-core processors
    Ye, John M.
    Cao, Man
    Qu, Zening
    Chen, Tianzhou
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2013, 79 (02) : 175 - 186
  • [20] Parallel Monte Carlo Tree Search from Multi-core to Many-core Processors
    Mirsoleimani, S. Ali
    Plaat, Aske
    van den Herik, Jaap
    Vermaseren, Jos
    2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 77 - 83