ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors

被引：15

作者：

Hou, Kaixi ^{[1
]}

Wang, Hao ^{[1
]}

Feng, Wu-chun ^{[1
]}

机构：

[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15) | 2015年

基金：

美国国家科学基金会;

关键词：

sort; merge; transpose; vectorization; SIMD; ISA; MIC; AVX; AVX-512;

D O I：

10.1145/2751205.2751247

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations tedious and error-prone. Hence, we propose a framework for the Automatic SIMDization of Parallel Sorting (ASPaS) on x86-based multicore and manycore processors. That is, ASPaS takes any sorting network and a given instruction set architecture (ISA) as inputs and automatically generates vectorized code for that sorting network. By formalizing the sort function as a sequence of comparators and the transpose and merge functions as sequences of vector-matrix multiplications, ASPaS can map these functions to operations from a selected "pattern pool" that is based on the characteristics of parallel sorting, and then generate the vectorized code with the real ISA intrinsics. The performance evaluation of our ASPaS framework on the Intel Xeon Phi coprocessor illustrates that automatically generated sorting codes from ASPaS can outperform the sorting implementations from STL, Boost, and Intel TBB.

引用

页码：383 / 392

页数：10

共 50 条

[11] Fast parallel stream compaction for IA-based multi/many-core processors
Sun, Qiao
Yang, Chao
Wu, Changmao
Li, Leisheng
Liu, Fangfang
2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 736 - 745
[12] Queuing Ports for Mesh Based Many-Core Processors
Villaescusa D.G.
Rivas M.A.
Harbour M.G.
Ada User Journal, 2021, 42 (3-4): : 189 - 192
[13] Highly scalable parallel genetic algorithm on Sunway many-core processors
Xiao, Zhiyong
Liu, Xu
Xu, Jingheng
Sun, Qingxiao
Gan, Lin
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 679 - 691
[14] Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors
Courtecuisse, Hadrien
Allard, Jeremie
HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 139 - 147
[15] POSTER: Reducing the Burden of Parallel Loop Schedulers for Many-Core Processors
Arif, Mahwish
Vandierendonck, Hans
ACM SIGPLAN NOTICES, 2018, 53 (01) : 383 - 384
[16] Sesame: A User-Transparent Optimizing Framework for Many-Core Processors
Fang, Jianbin
Varbanescu, Ana Lucia
Sips, Henk
PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 70 - 73
[17] A many-core based parallel tabu search
Lam, Yuet M.
Luk, Wayne
International Journal of Computers and Applications, 2014, 36 (01) : 15 - 22
[18] The Research on The CPU Intelligent Scheduling Based On The Many-core Processors
Shao Zuozhi
Zhang Yingqiang
Mu Hongtao
Cheng Rui
PROCEEDINGS OF 2016 IEEE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2016), 2016, : 779 - 782
[19] Regional cache organization for NoC based many-core processors
Ye, John M.
Cao, Man
Qu, Zening
Chen, Tianzhou
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2013, 79 (02) : 175 - 186
[20] Parallel Monte Carlo Tree Search from Multi-core to Many-core Processors
Mirsoleimani, S. Ali
Plaat, Aske
van den Herik, Jaap
Vermaseren, Jos
2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 3, 2015, : 77 - 83

← 1 2 3 4 5 →