Faster Population Counts Using AVX2 Instructions

被引：30

作者：

Mula, Wojciech ^{[1
]}

Kurz, Nathan ^{[1
]}

Lemire, Daniel ^{[1
]}

机构：

[1] Univ Quebec TELUQ, 5800 St Denis, Montreal, PQ H2S 3L5, Canada

来源：

COMPUTER JOURNAL | 2018年 / 61卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

software performance; SIMD instructions; vectorization; bitset; Jaccard index;

D O I：

10.1093/comjnl/bxx046

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).

引用

页码：111 / 120

页数：10

共 50 条

[31] Vectorization of Flat Loops of Arbitrary Structure Using Instructions AVX-512
Savin, G. I.
Shabanov, B. M.
Rybakov, A. A.
Shumilin, S. S.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2020, 41 (12) : 2575 - 2592
[32] Fast Multiple Montgomery Multiplications Using Intel AVX-512IFMA Instructions
Takahashi, Daisuke
COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 655 - 663
[33] Faster Implementation of Ideal Lattice-Based Cryptography Using AVX512
Lei, Douwei
He, Debiao
Peng, Cong
Luo, Min
Liu, Zhe
Huang, Xinyi
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (05)
[34] NTT multiplication for NTT-unfriendly rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2
Chung, Chi-Ming Marvin
Hwang, Vincent
Kannwischer, Matthias J.
Seiler, Gregor
Shih, Cheng-Jhih
Yang, Bo-Yin
IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021, 2021 (02): : 159 - 188
[35] A new AXT format for an efficient SpMV product using AVX-512 instructions and CUDA
Coronado-Barrientos, E.
Antonioletti, M.
Garcia-Loureiro, A.
ADVANCES IN ENGINEERING SOFTWARE, 2021, 156
[36] An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions
Takahashi, Daisuke
COMPUTER ALGEBRA IN SCIENTIFIC COMPUTING (CASC 2022), 2022, 13366 : 318 - 332
[37] Acceleration of Homomorphic Unrolled Trace-Type Function using AVX512 instructions
Inoue, Kotaro
Suzuki, Takuya
Yamana, Hayato
PROCEEDINGS OF THE 10TH WORKSHOP ON ENCRYPTED COMPUTING & APPLIED HOMOMORPHIC CRYPTOGRAPHY, WAHC 2022, 2022, : 47 - 52
[38] Alternative quadrant representations with Morton index and AVX2 vectorization for AMR algorithms within the p4est software library
Kirilin, Mikhail
Burstedde, Carsten
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 301 - 310
[39] Faster indicators of dengue fever case counts using Google and Twitter
Mizzi, Giovanni
Preis, Tobias
Bastos, Leonardo Soares
da Costa Gomes, Marcelo Ferreira
Codeço, Claudia Torres
Moat, Helen Susannah
arXiv, 2021,
[40] Acceleration of LU decomposition supporting double-double, triple-double, and quadruple-double precision floating-point arithmetic with AVX2
Kouya, Tomonori
2021 IEEE 28TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH 2021), 2021, : 54 - 61

← 1 2 3 4 5 →