ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors

被引:15
|
作者
Hou, Kaixi [1 ]
Wang, Hao [1 ]
Feng, Wu-chun [1 ]
机构
[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA
基金
美国国家科学基金会;
关键词
sort; merge; transpose; vectorization; SIMD; ISA; MIC; AVX; AVX-512;
D O I
10.1145/2751205.2751247
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations tedious and error-prone. Hence, we propose a framework for the Automatic SIMDization of Parallel Sorting (ASPaS) on x86-based multicore and manycore processors. That is, ASPaS takes any sorting network and a given instruction set architecture (ISA) as inputs and automatically generates vectorized code for that sorting network. By formalizing the sort function as a sequence of comparators and the transpose and merge functions as sequences of vector-matrix multiplications, ASPaS can map these functions to operations from a selected "pattern pool" that is based on the characteristics of parallel sorting, and then generate the vectorized code with the real ISA intrinsics. The performance evaluation of our ASPaS framework on the Intel Xeon Phi coprocessor illustrates that automatically generated sorting codes from ASPaS can outperform the sorting implementations from STL, Boost, and Intel TBB.
引用
收藏
页码:383 / 392
页数:10
相关论文
共 50 条
  • [21] A framework to the design and programming of many-core focal-plane vision processors
    Mori, Jones Y.
    Llanos, Carlos
    Huebner, Michael
    PROCEEDINGS IEEE/IFIP 13TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING 2015, 2015, : 193 - 198
  • [22] LARRABEE: A MANY-CORE X86 ARCHITECTURE FOR VISUAL COMPUTING
    Seiler, Larry
    Carmean, Doug
    Sprangle, Eric
    Forsyth, Tom
    Dubey, Pradeep
    Junkins, Stephen
    Lake, Adam
    Cavin, Robert
    Espasa, Roger
    Grochowski, Ed
    Juan, Toni
    Abrash, Michael
    Sugerman, Jeremy
    Hanrahan, Pat
    IEEE MICRO, 2009, 29 (01) : 10 - 21
  • [23] Larrabee: A many-core x86 architecture for visual computing
    Seiler, Larry
    Carmean, Doug
    Sprangle, Eric
    Forsyth, Tom
    Abrash, Michael
    Dubey, Pradeep
    Junkins, Stephen
    Lake, Adam
    Sugerman, Jeremy
    Cavin, Robert
    Espasa, Roger
    Grochowski, Ed
    Juan, Toni
    Hanrahan, Pat
    ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (03):
  • [24] Parallel Path Delay Fault Simulation for Multi/Many-Core Processors with SIMD Units
    Ali, Yussuf
    Yamato, Yuta
    Yoneda, Tomokazu
    Hatayama, Kazumi
    Inoue, Michiko
    2014 IEEE 23RD ASIAN TEST SYMPOSIUM (ATS), 2014, : 292 - 297
  • [25] A Scalable Parallel Partition Tridiagonal Solver for Many-Core and Low B/F Processors
    Mitsuda, Tatsuya
    Ono, Kenji
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 860 - 869
  • [26] Partition-Based Hardware Transactional Memory for Many-Core Processors
    Liu, Yi
    Zhang, Xinwei
    Wang, Yonghui
    Qian, Depei
    Chen, Yali
    Wu, Jin
    NETWORK AND PARALLEL COMPUTING, NPC 2013, 2013, 8147 : 308 - 321
  • [27] Latency Analysis of Network-On-Chip based Many-Core Processors
    Kumar, Sunil
    Lipari, Giuseppe
    2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 432 - 439
  • [28] Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform
    Yan, Chenggang
    Zhang, Yongdong
    Dai, Feng
    Li, Liang
    2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 63 - 72
  • [29] Efficient Parallel Framework for HEVC Deblocking Filter on Many-core Platform
    Yan, Chenggang
    Zhang, Yongdong
    Dai, Feng
    Li, Liang
    2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 530 - 530
  • [30] GRapid: a Compilation and Runtime Framework for Rapid Prototyping of Graph Applications on Many-core Processors
    Li, Da
    Chakradhar, Srimat
    Becchi, Michela
    2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 174 - 182