Faster Population Counts Using AVX2 Instructions

被引:30
|
作者
Mula, Wojciech [1 ]
Kurz, Nathan [1 ]
Lemire, Daniel [1 ]
机构
[1] Univ Quebec TELUQ, 5800 St Denis, Montreal, PQ H2S 3L5, Canada
来源
COMPUTER JOURNAL | 2018年 / 61卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
software performance; SIMD instructions; vectorization; bitset; Jaccard index;
D O I
10.1093/comjnl/bxx046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).
引用
收藏
页码:111 / 120
页数:10
相关论文
共 50 条
  • [21] High-Speed AVX2 Implementation of AKCN-MLWE
    Yang H.
    Liu Z.
    Huang J.-H.
    Shen S.-Y.
    Zhao Y.-L.
    Liu, Zhe (zhe.liu@nuaa.edu.cn), 1600, Science Press (44): : 2560 - 2572
  • [22] Fast Quicksort Implementation Using AVX Instructions
    Gueron, Shay
    Krasnov, Vlad
    COMPUTER JOURNAL, 2016, 59 (01): : 83 - 90
  • [23] Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
    Beullens W.
    Campos F.
    Celi S.
    Hess B.
    Kannwischer M.J.
    IACR Transactions on Cryptographic Hardware and Embedded Systems, 2024, 2024 (02): : 252 - 275
  • [24] Efficient computation of positional population counts using SIMD instructions
    Klarqvist, Marcus D. R.
    Mula, Wojciech
    Lemire, Daniel
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (17):
  • [25] QCKer: An x86-AVX/AVX2 Implementation of Q-gram Counting Filter for DNA Sequence Alignment
    Pernez, Joven L., Jr.
    Borja, Kaizen Vinz A.
    Uy, Roger Luis
    Maghirang, Jan Carlo G.
    PROCEEDINGS OF 2019 6TH INTERNATIONAL CONFERENCE BIOINFORMATICS RESEARCH AND APPLICATIONS (ICBRA 2019), 2019, : 49 - 54
  • [26] Parallel Implementation of SM2 Elliptic Curve Cryptography on Intel Processors with AVX2
    Huang, Junhao
    Liu, Zhe
    Hu, Zhi
    Grossschadl, Johann
    INFORMATION SECURITY AND PRIVACY, ACISP 2020, 2020, 12248 : 204 - 224
  • [27] Acceleration of Multiple Precision Matrix Multiplication Based on Multi-component Floating-Point Arithmetic Using AVX2
    Kouya, Tomonori
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT V, 2021, 12953 : 202 - 217
  • [28] 基于Intel AVX2的LTE DFT和IDFT快速算法
    曹如球
    信息通信, 2014, (08) : 11 - 12
  • [29] Vectorization of Flat Loops of Arbitrary Structure Using Instructions AVX-512
    G. I. Savin
    B. M. Shabanov
    A. A. Rybakov
    S. S. Shumilin
    Lobachevskii Journal of Mathematics, 2020, 41 : 2575 - 2592
  • [30] 基于AVX2指令集的深度学习混合运算策略
    蒋文斌
    王宏斌
    刘湃
    陈雨浩
    清华大学学报(自然科学版), 2020, 60 (05) : 408 - 414