Faster Population Counts Using AVX2 Instructions

被引:30
|
作者
Mula, Wojciech [1 ]
Kurz, Nathan [1 ]
Lemire, Daniel [1 ]
机构
[1] Univ Quebec TELUQ, 5800 St Denis, Montreal, PQ H2S 3L5, Canada
来源
COMPUTER JOURNAL | 2018年 / 61卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
software performance; SIMD instructions; vectorization; bitset; Jaccard index;
D O I
10.1093/comjnl/bxx046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word (e.g. popcnt on x64 processors). Maybe surprisingly, we show that a vectorized approach using SIMD instructions can be twice as fast as using the dedicated instructions on recent Intel processors. The benefits can be even greater for applications such as similarity measures (e.g. the Jaccard index) that require additional Boolean operations. Our approach has been adopted by LLVM: it is used by its popular C compiler (Clang).
引用
收藏
页码:111 / 120
页数:10
相关论文
共 50 条
  • [1] Faster Base64 Encoding and Decoding Using AVX2 Instructions
    Mula, Wojciech
    Lemire, Daniel
    ACM TRANSACTIONS ON THE WEB, 2018, 12 (03)
  • [2] String searching with mismatches using AVX2 and AVX-512 instructions
    Chhabra, Tamanna
    Ghuman, Sukhpal Singh
    Tarhio, Jorma
    INFORMATION PROCESSING LETTERS, 2025, 189
  • [3] SIMD vectorization for the Lennard-Jones potential with AVX2 and AVX-512 instructions
    Watanabe, Hiroshi
    Nakagawa, Koh M.
    COMPUTER PHYSICS COMMUNICATIONS, 2019, 237 : 1 - 7
  • [4] An Exploration of Using the Intel AVX2 Gather Load Instructions for Vectorised Image Processing
    Cree, Michael J.
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [5] High-Throughput Elliptic Curve Cryptography Using AVX2 Vector Instructions
    Cheng, Hao
    Grossschaedl, Johann
    Tian, Jiaqi
    Ronne, Peter B.
    Ryan, Peter Y. A.
    SELECTED AREAS IN CRYPTOGRAPHY, 2021, 12804 : 698 - 719
  • [6] Fair Scheduling for AVX2 and AVX-512 Workloads
    Gottschlag, Mathias
    Machauer, Philipp
    Khalil, Yussuf
    Bellosa, Frank
    PROCEEDINGS OF THE 2021 USENIX ANNUAL TECHNICAL CONFERENCE, 2021, : 745 - 758
  • [7] Optimizing Dilithium Implementation with AVX2/-512
    Xu, Runqing
    He, Debiao
    Luo, Min
    Peng, Cong
    Zeng, Xiangyong
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (06)
  • [8] Fast Implementation of Curve25519 Using AVX2
    Faz-Hernandez, Armando
    Lopez, Julio
    PROGRESS IN CRYPTOLOGY - LATINCRYPT 2015, 2015, 9230 : 329 - 345
  • [9] Fast Implementation of Simeck Family Block Ciphers Using AVX2
    Park, Taehwan
    Seo, Hwajeong
    Kim, Howon
    2018 INTERNATIONAL CONFERENCE ON PLATFORM TECHNOLOGY AND SERVICE (PLATCON18), 2018, : 208 - 211
  • [10] Accelerating a Geometrical Approximated PCA Algorithm Using AVX2 and CUDA
    Machidon, Alina L.
    Machidon, Octavian M.
    Ciobanu, Catalin B.
    Ogrutan, Petre L.
    REMOTE SENSING, 2020, 12 (12)