High-Performance Geometric Algorithms for Sparse Computation in Big Data Analytics

被引：0

作者：

Baumann, Philipp ^{[1
]}

Hochbaum, Dorit S. ^{[2
]}

Spaen, Quico ^{[2
]}

机构：

[1] Univ Bern, Dept Business Adm, Bern, Switzerland

[2] Univ Calif Berkeley, IEOR Dept, Berkeley, CA 94720 USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2017年

关键词：

Big data; similarity-based machine learning; sparsification; sparse computation; computational geometry;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Several leading supervised and unsupervised machine learning algorithms require as input similarities between objects in a data set. Since the number of pairwise similarities grows quadratically with the size of the data set, it is computationally prohibitive to compute all pairwise similarities for large-scale data sets. The recently introduced methodology of "sparse computation" resolves this issue by computing only the relevant similarities instead of all pairwise similarities. To identify the relevant similarities, sparse computation efficiently projects the data onto a low-dimensional space where a similarity is considered relevant if the corresponding objects are close in this space. The relevant similarities are then computed in the original space. Sparse computation identifies close pairs by partitioning the low-dimensional space into grid blocks, and considering objects close if they fall in the same or adjacent grid blocks. This guarantees that all pairs of objects that are within a specified L-infinity distance are identified as well as some pairs that are within twice this distance. For very large data sets, sparse computation can have high runtime due to the enumeration of pairs of adjacent blocks. We propose here new geometric algorithms that eliminate the need to enumerate adjacent blocks. Our empirical results on data sets with up to 10 million objects show that the new algorithms achieve a significant reduction in runtime. The algorithms have applications in large-scale computational geometry and ( approximate) nearest neighbor search. Python implementations of the proposed algorithms are publicly available.

引用

页码：546 / 555

页数：10

共 50 条

[21] High-Performance Computing based Scalable Online Fuzzy Clustering Algorithms for Big Data
Jha, Preeti
Tiwari, Aruna
Bharill, Neha
Ratnaparkhe, Milind
Patel, Om Prakash
Pulakitha, Rapolu
Chauhan, Aditi
[J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1400 - 1407
[22] Transforming medical sciences with high-performance computing, high-performance data analytics and AI
Lewandowski, Natalie
Koller, Bastian
[J]. TECHNOLOGY AND HEALTH CARE, 2023, 31 (04) : 1505 - 1507
[23] High performance deep learning techniques for big data analytics
Li, Maozhen
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
[24] Predictive Analytics on Genomic Data with High-Performance Computing
Leung, Carson K.
Sarumi, Oluwafemi A.
Zhang, Christine Y.
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2187 - 2194
[25] High-performance graph algorithms from parallel sparse matrices
Gilbert, John R.
Reinhardt, Steve
Shah, Viral B.
[J]. APPLIED PARALLEL COMPUTING: STATE OF THE ART IN SCIENTIFIC COMPUTING, 2007, 4699 : 260 - +
[26] Contributions to High-Performance Big Data Computing
Fox, Geoffrey
Qiu, Judy
Crandall, David
Von Laszewski, Gregor
Beckstein, Oliver
Paden, John
Paraskevakos, Ioannis
Jha, Shantenu
Wang, Fusheng
Marathe, Madhav
Vullikanti, Anil
Cheatham, Thomas
[J]. FUTURE TRENDS OF HPC IN A DISRUPTIVE SCENARIO, 2019, 34 : 34 - 81
[27] High-Performance Computing for Big Data Processing
Wu, Yulei
Xiang, Yang
Ge, Jingguo
Muller, Peter
[J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 88 : 693 - 695
[28] Advanced Computation of Sparse Precision Matrices for Big Data
Baggag, Abdelkader
Bensmail, Halima
Srivastava, Jaideep
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 27 - 38
[29] Different Clustering Algorithms for Big Data Analytics: A Review
Dave, Meenu
Gianey, Hemant
[J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 328 - 333
[30] Online learning algorithms for big data analytics: A survey
Li, Zhijie
Li, Yuanxiang
Wang, Feng
He, Guoliang
Kuang, Li
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (08): : 1707 - 1721

← 1 2 3 4 5 →