Faster Population Counts Using AVX2 Instructions

被引:30
|
作者
Mula, Wojciech [1 ]
Kurz, Nathan [1 ]
Lemire, Daniel [1 ]
机构
[1] Univ Quebec TELUQ, 5800 St Denis, Montreal, PQ H2S 3L5, Canada
来源
COMPUTER JOURNAL | 2018年 / 61卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
software performance; SIMD instructions; vectorization; bitset; Jaccard index;
D O I
10.1093/comjnl/bxx046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).
引用
收藏
页码:111 / 120
页数:10
相关论文
共 50 条
  • [31] Vectorization of Flat Loops of Arbitrary Structure Using Instructions AVX-512
    Savin, G. I.
    Shabanov, B. M.
    Rybakov, A. A.
    Shumilin, S. S.
    LOBACHEVSKII JOURNAL OF MATHEMATICS, 2020, 41 (12) : 2575 - 2592
  • [32] Fast Multiple Montgomery Multiplications Using Intel AVX-512IFMA Instructions
    Takahashi, Daisuke
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT V, 2020, 12253 : 655 - 663
  • [33] Faster Implementation of Ideal Lattice-Based Cryptography Using AVX512
    Lei, Douwei
    He, Debiao
    Peng, Cong
    Luo, Min
    Liu, Zhe
    Huang, Xinyi
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2023, 22 (05)
  • [34] NTT multiplication for NTT-unfriendly rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2
    Chung, Chi-Ming Marvin
    Hwang, Vincent
    Kannwischer, Matthias J.
    Seiler, Gregor
    Shih, Cheng-Jhih
    Yang, Bo-Yin
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021, 2021 (02): : 159 - 188
  • [35] A new AXT format for an efficient SpMV product using AVX-512 instructions and CUDA
    Coronado-Barrientos, E.
    Antonioletti, M.
    Garcia-Loureiro, A.
    ADVANCES IN ENGINEERING SOFTWARE, 2021, 156
  • [36] An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions
    Takahashi, Daisuke
    COMPUTER ALGEBRA IN SCIENTIFIC COMPUTING (CASC 2022), 2022, 13366 : 318 - 332
  • [37] Acceleration of Homomorphic Unrolled Trace-Type Function using AVX512 instructions
    Inoue, Kotaro
    Suzuki, Takuya
    Yamana, Hayato
    PROCEEDINGS OF THE 10TH WORKSHOP ON ENCRYPTED COMPUTING & APPLIED HOMOMORPHIC CRYPTOGRAPHY, WAHC 2022, 2022, : 47 - 52
  • [38] Alternative quadrant representations with Morton index and AVX2 vectorization for AMR algorithms within the p4est software library
    Kirilin, Mikhail
    Burstedde, Carsten
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 301 - 310
  • [39] Faster indicators of dengue fever case counts using Google and Twitter
    Mizzi, Giovanni
    Preis, Tobias
    Bastos, Leonardo Soares
    da Costa Gomes, Marcelo Ferreira
    Codeço, Claudia Torres
    Moat, Helen Susannah
    arXiv, 2021,
  • [40] Acceleration of LU decomposition supporting double-double, triple-double, and quadruple-double precision floating-point arithmetic with AVX2
    Kouya, Tomonori
    2021 IEEE 28TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH 2021), 2021, : 54 - 61