ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors

被引：15

作者：

Hou, Kaixi ^{[1
]}

Wang, Hao ^{[1
]}

Feng, Wu-chun ^{[1
]}

机构：

[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15) | 2015年

基金：

美国国家科学基金会;

关键词：

sort; merge; transpose; vectorization; SIMD; ISA; MIC; AVX; AVX-512;

D O I：

10.1145/2751205.2751247

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations tedious and error-prone. Hence, we propose a framework for the Automatic SIMDization of Parallel Sorting (ASPaS) on x86-based multicore and manycore processors. That is, ASPaS takes any sorting network and a given instruction set architecture (ISA) as inputs and automatically generates vectorized code for that sorting network. By formalizing the sort function as a sequence of comparators and the transpose and merge functions as sequences of vector-matrix multiplications, ASPaS can map these functions to operations from a selected "pattern pool" that is based on the characteristics of parallel sorting, and then generate the vectorized code with the real ISA intrinsics. The performance evaluation of our ASPaS framework on the Intel Xeon Phi coprocessor illustrates that automatically generated sorting codes from ASPaS can outperform the sorting implementations from STL, Boost, and Intel TBB.

引用

页码：383 / 392

页数：10

共 50 条

[1] A Framework for the Automatic Vectorization of Parallel Sort on x86-Based Processors
Hou, Kaixi
Wang, Hao
Feng, Wu-Chun
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (05) : 958 - 972
[2] AAlign: A SIMD Framework for Pairwise Sequence Alignment on x86-based Multi- and Many-core Processors
Hou, Kaixi
Wang, Hao
Feng, Wu-chun
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, : 780 - 789
[3] Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors
Yan, Chenggang
Zhang, Yongdong
Xu, Jizheng
Dai, Feng
Zhang, Jun
Dai, Qionghai
Wu, Feng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (12) : 2077 - 2089
[4] Parallelizing Compilation Framework for Heterogeneous Many-core Processors
Li Y.-B.
Zhao R.-C.
Han L.
Zhao J.
Xu J.-L.
Li Y.-Y.
Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 981 - 1001
[5] A Homogeneous Many-core x86 Processor Full System Framework Based on NoC
Zhang, Qinhong
Zhou, Meng
Chen, Juan
Yang, Hao
PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 794 - 797
[6] A Scalable Parallel Architecture Based on Many-Core Processors for Generating HTTP Traffic
Wang, Xinheng
Xu, Chuan
Jin, Wenqiang
Wang, Jiajie
Wang, Qianyun
Zhao, Guofeng
APPLIED SCIENCES-BASEL, 2017, 7 (02):
[7] Parallel space saving on multi- and many-core processors
Cafaro, Massimo
Pulimeno, Marco
Epicoco, Italo
Aloisio, Giovanni
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07):
[8] Reducing the burden of parallel loop schedulers for many-core processors
Arif, Mahwish
Vandierendonck, Hans
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13):
[9] PARALLEL SIMULATION OF MANY-CORE PROCESSORS: INTEGRATION OF RESEARCH AND EDUCATION
Moreshet, Tali
Vishkin, Uzi
Keceli, Fuat
2012 ASEE ANNUAL CONFERENCE, 2012,
[10] A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors
Yan, Chenggang
Zhang, Yongdong
Xu, Jizheng
Dai, Feng
Li, Liang
Dai, Qionghai
Wu, Feng
IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (05) : 573 - 576

← 1 2 3 4 5 →