Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

被引:10
|
作者
Wang, Yiqiu [1 ]
Shrivastava, Anshumali [1 ]
Wang, Jonathan [1 ]
Ryu, Junghee [1 ]
机构
[1] Rice Univ, Houston, TX 77251 USA
关键词
Similarity search; locality sensitive hashing; reservoir sampling; GPGPU;
D O I
10.1145/3183713.3196925
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present FLASH (Fast LSH Algorithm for Similarity search accelerated with HPC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n(2)D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results(1).
引用
收藏
页码:889 / 903
页数:15
相关论文
共 22 条
  • [21] Learning Laparoscopic Surgery Skills With a 4K Ultra-High Definition 2D vs a Three-Dimensional HD Laparoscopic System: Results From a Prospective Randomized Trial
    Thomaschewski, Michael
    Juergens, Thorsten
    Keck, Tobias
    Laubert, Tilman
    Benecke, Claudia
    SURGICAL INNOVATION, 2021, 28 (06) : 760 - 767
  • [22] Acquiring basic and advanced laparoscopic skills in novices using two-dimensional (2D), three-dimensional (3D) and ultra-high definition (4K) vision systems: A randomized control study
    Abdelrahman, M.
    Belramman, A.
    Salem, R.
    Patel, B.
    INTERNATIONAL JOURNAL OF SURGERY, 2018, 53 : 333 - 338