ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors

被引：15

作者：

Hou, Kaixi ^{[1
]}

Wang, Hao ^{[1
]}

Feng, Wu-chun ^{[1
]}

机构：

[1] Virginia Tech, Dept Comp Sci, Blacksburg, VA 24060 USA

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15) | 2015年

基金：

美国国家科学基金会;

关键词：

sort; merge; transpose; vectorization; SIMD; ISA; MIC; AVX; AVX-512;

D O I：

10.1145/2751205.2751247

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the difficulty that modern compilers have in vectorizing applications on vector-extension architectures, programmers resort to manually programming vector registers with intrinsics in order to achieve better performance. However, the continued growth in the width of registers and the evolving library of intrinsics make such manual optimizations tedious and error-prone. Hence, we propose a framework for the Automatic SIMDization of Parallel Sorting (ASPaS) on x86-based multicore and manycore processors. That is, ASPaS takes any sorting network and a given instruction set architecture (ISA) as inputs and automatically generates vectorized code for that sorting network. By formalizing the sort function as a sequence of comparators and the transpose and merge functions as sequences of vector-matrix multiplications, ASPaS can map these functions to operations from a selected "pattern pool" that is based on the characteristics of parallel sorting, and then generate the vectorized code with the real ISA intrinsics. The performance evaluation of our ASPaS framework on the Intel Xeon Phi coprocessor illustrates that automatically generated sorting codes from ASPaS can outperform the sorting implementations from STL, Boost, and Intel TBB.

引用

页码：383 / 392

页数：10

共 50 条

[21] A framework to the design and programming of many-core focal-plane vision processors
Mori, Jones Y.
Llanos, Carlos
Huebner, Michael
PROCEEDINGS IEEE/IFIP 13TH INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING 2015, 2015, : 193 - 198
[22] LARRABEE: A MANY-CORE X86 ARCHITECTURE FOR VISUAL COMPUTING
Seiler, Larry
Carmean, Doug
Sprangle, Eric
Forsyth, Tom
Dubey, Pradeep
Junkins, Stephen
Lake, Adam
Cavin, Robert
Espasa, Roger
Grochowski, Ed
Juan, Toni
Abrash, Michael
Sugerman, Jeremy
Hanrahan, Pat
IEEE MICRO, 2009, 29 (01) : 10 - 21
[23] Larrabee: A many-core x86 architecture for visual computing
Seiler, Larry
Carmean, Doug
Sprangle, Eric
Forsyth, Tom
Abrash, Michael
Dubey, Pradeep
Junkins, Stephen
Lake, Adam
Sugerman, Jeremy
Cavin, Robert
Espasa, Roger
Grochowski, Ed
Juan, Toni
Hanrahan, Pat
ACM TRANSACTIONS ON GRAPHICS, 2008, 27 (03):
[24] Parallel Path Delay Fault Simulation for Multi/Many-Core Processors with SIMD Units
Ali, Yussuf
Yamato, Yuta
Yoneda, Tomokazu
Hatayama, Kazumi
Inoue, Michiko
2014 IEEE 23RD ASIAN TEST SYMPOSIUM (ATS), 2014, : 292 - 297
[25] A Scalable Parallel Partition Tridiagonal Solver for Many-Core and Low B/F Processors
Mitsuda, Tatsuya
Ono, Kenji
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 860 - 869
[26] Partition-Based Hardware Transactional Memory for Many-Core Processors
Liu, Yi
Zhang, Xinwei
Wang, Yonghui
Qian, Depei
Chen, Yali
Wu, Jin
NETWORK AND PARALLEL COMPUTING, NPC 2013, 2013, 8147 : 308 - 321
[27] Latency Analysis of Network-On-Chip based Many-Core Processors
Kumar, Sunil
Lipari, Giuseppe
2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 432 - 439
[28] Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform
Yan, Chenggang
Zhang, Yongdong
Dai, Feng
Li, Liang
2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 63 - 72
[29] Efficient Parallel Framework for HEVC Deblocking Filter on Many-core Platform
Yan, Chenggang
Zhang, Yongdong
Dai, Feng
Li, Liang
2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 530 - 530
[30] GRapid: a Compilation and Runtime Framework for Rapid Prototyping of Graph Applications on Many-core Processors
Li, Da
Chakradhar, Srimat
Becchi, Michela
2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 174 - 182

← 1 2 3 4 5 →