High-Performance Geometric Algorithms for Sparse Computation in Big Data Analytics

被引:0
|
作者
Baumann, Philipp [1 ]
Hochbaum, Dorit S. [2 ]
Spaen, Quico [2 ]
机构
[1] Univ Bern, Dept Business Adm, Bern, Switzerland
[2] Univ Calif Berkeley, IEOR Dept, Berkeley, CA 94720 USA
关键词
Big data; similarity-based machine learning; sparsification; sparse computation; computational geometry;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Several leading supervised and unsupervised machine learning algorithms require as input similarities between objects in a data set. Since the number of pairwise similarities grows quadratically with the size of the data set, it is computationally prohibitive to compute all pairwise similarities for large-scale data sets. The recently introduced methodology of "sparse computation" resolves this issue by computing only the relevant similarities instead of all pairwise similarities. To identify the relevant similarities, sparse computation efficiently projects the data onto a low-dimensional space where a similarity is considered relevant if the corresponding objects are close in this space. The relevant similarities are then computed in the original space. Sparse computation identifies close pairs by partitioning the low-dimensional space into grid blocks, and considering objects close if they fall in the same or adjacent grid blocks. This guarantees that all pairs of objects that are within a specified L-infinity distance are identified as well as some pairs that are within twice this distance. For very large data sets, sparse computation can have high runtime due to the enumeration of pairs of adjacent blocks. We propose here new geometric algorithms that eliminate the need to enumerate adjacent blocks. Our empirical results on data sets with up to 10 million objects show that the new algorithms achieve a significant reduction in runtime. The algorithms have applications in large-scale computational geometry and ( approximate) nearest neighbor search. Python implementations of the proposed algorithms are publicly available.
引用
收藏
页码:546 / 555
页数:10
相关论文
共 50 条
  • [1] HIGH-PERFORMANCE COMPUTING BASED BIG DATA ANALYTICS FOR SMART MANUFACTURING
    Yang, Yuhang
    Cai, Y. Dora
    Lu, Qiyue
    Zhang, Yifang
    Koric, Seid
    Shao, Chenhui
    [J]. PROCEEDINGS OF THE ASME 13TH INTERNATIONAL MANUFACTURING SCIENCE AND ENGINEERING CONFERENCE, 2018, VOL 3, 2018,
  • [2] Optimized load balancing in high-performance computing for big data analytics
    Mirtaheri, Seyedeh Leili
    Grandinetti, Lucio
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (16):
  • [3] Taichi: A Language for High-Performance Computation on Spatially Sparse Data Structures
    Hu, Yuanming
    Li, Tzu-Mao
    Anderson, Luke
    Ragan-Kelley, Jonathan
    Durand, Fredo
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (06):
  • [4] Big Data and High-Performance Analytics in Structural Health Monitoring for Bridge Management
    Alampalli, Sharada
    Alampalli, Sandeep
    Ettouney, Mohammed
    [J]. SENSORS AND SMART STRUCTURES TECHNOLOGIES FOR CIVIL, MECHANICAL, AND AEROSPACE SYSTEMS 2016, 2016, 9803
  • [5] Approximate Computation for Big Data Analytics
    Ma, Shuai
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2018, 2018, 10837 : XVIII - XVIII
  • [6] High-Performance Computing for Data Analytics
    Perrin, Dimitri
    Bezbradica, Marija
    Crane, Martin
    Ruskin, Heather J.
    Duhamel, Christophe
    [J]. 2012 IEEE/ACM 16TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED SIMULATION AND REAL TIME APPLICATIONS (DS-RT), 2012, : 234 - 242
  • [7] Applying intelligent data traffic adaptation to high-performance multiple big data analytics platforms
    Chang, Bao Rong
    Tsai, Hsiu-Fen
    Liao, Po-Hao
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 70 : 998 - 1018
  • [8] HIGH-PERFORMANCE COMPUTATION OF THE EXPONENTIAL OF A LARGE SPARSE MATRIX
    Wu, Feng
    Zhang, Kailing
    Zhu, Li
    Hu, Jiayao
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2021, 42 (04) : 1636 - 1655
  • [9] Design of Algorithms for Big Data Analytics
    Bhatnagar, Raj
    [J]. BIG DATA ANALYTICS, BDA 2015, 2015, 9498 : 101 - 107
  • [10] A Unified Computation Engine for Big Data Analytics
    Xu, Chenyang
    Chen, Yanjie
    Liu, Qin
    Rao, Weixiong
    Min, Hong
    Su, Gong
    [J]. 2015 IEEE/ACM 2ND INTERNATIONAL SYMPOSIUM ON BIG DATA COMPUTING (BDC), 2015, : 73 - 77