Scalable Implementation of Dependence Clustering in Apache Spark

被引：0

作者：

Ivannikova, Elena ^{[1
]}

机构：

[1] Univ Jyvaskyla, Dept Math Informat Technol, POB 35 Agora, Jyvaskyla 40014, Finland

来源：

PROCEEDINGS OF THE 2017 EVOLVING AND ADAPTIVE INTELLIGENT SYSTEMS (EAIS) | 2017年

关键词：

COMMUNITY STRUCTURE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs.

引用

页数：6

共 50 条

[1] SpaRC: scalable sequence clustering using Apache Spark
Shi, Lizhen
Meng, Xiandong
Tseng, Elizabeth
Mascagni, Michael
Wang, Zhong
[J]. BIOINFORMATICS, 2019, 35 (05) : 760 - 768
[2] An Apache Spark Implementation for Text Document Clustering
Dritsas, Elias
Trigka, Maria
Vonitsanos, Gerasimos
Kanavos, Andreas
Mylonas, Phivos
[J]. 2022 17TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION & PERSONALIZATION (SMAP 2022), 2022, : 50 - 55
[3] Scalable Online-Offline Stream Clustering in Apache Spark
Backhoff, Omar
Ntoutsi, Eirini
[J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 37 - 44
[4] A Scalable Short-Text Clustering Algorithm Using Apache Spark
Akritidis, Leonidas
Alamaniotis, Miltiadis
Fevgas, Athanasios
Bozanis, Panayiotis
[J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 927 - 934
[5] Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark
Bharill, Neha
Tiwari, Aruna
Malviya, Aayushi
[J]. PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 95 - 104
[6] Scalable Taxonomy Generation and Evolution on Apache Spark
Aalijah, Kanwal
Irfan, Rabia
[J]. 2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 634 - 639
[7] Apache Spark-based scalable feature extraction approaches for protein sequence and their clustering performance analysis
Jha, Preeti
Tiwari, Aruna
Bharill, Neha
Ratnaparkhe, Milind
Patel, Om Prakash
Harshith, Nilagiri
Mounika, Mukkamalla
Nagendra, Neha
[J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 15 (04) : 359 - 378
[8] Apache Spark-based scalable feature extraction approaches for protein sequence and their clustering performance analysis
Preeti Jha
Aruna Tiwari
Neha Bharill
Milind Ratnaparkhe
Om Prakash Patel
Nilagiri Harshith
Mukkamalla Mounika
Neha Nagendra
[J]. International Journal of Data Science and Analytics, 2023, 15 : 359 - 378
[9] Scalable Manifold Learning for Big Data with Apache Spark
Schoeneman, Frank
Zola, Jaroslaw
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 272 - 281
[10] Is Apache Spark Scalable to Seismic Data Analytics and Computations?
Yan, Yuzhong
Huang, Lei
Yi, Liqi
[J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2036 - 2045

← 1 2 3 4 5 →