Scalable Implementation of Dependence Clustering in Apache Spark

被引:0
|
作者
Ivannikova, Elena [1 ]
机构
[1] Univ Jyvaskyla, Dept Math Informat Technol, POB 35 Agora, Jyvaskyla 40014, Finland
关键词
COMMUNITY STRUCTURE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] SpaRC: scalable sequence clustering using Apache Spark
    Shi, Lizhen
    Meng, Xiandong
    Tseng, Elizabeth
    Mascagni, Michael
    Wang, Zhong
    [J]. BIOINFORMATICS, 2019, 35 (05) : 760 - 768
  • [2] An Apache Spark Implementation for Text Document Clustering
    Dritsas, Elias
    Trigka, Maria
    Vonitsanos, Gerasimos
    Kanavos, Andreas
    Mylonas, Phivos
    [J]. 2022 17TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION & PERSONALIZATION (SMAP 2022), 2022, : 50 - 55
  • [3] Scalable Online-Offline Stream Clustering in Apache Spark
    Backhoff, Omar
    Ntoutsi, Eirini
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 37 - 44
  • [4] A Scalable Short-Text Clustering Algorithm Using Apache Spark
    Akritidis, Leonidas
    Alamaniotis, Miltiadis
    Fevgas, Athanasios
    Bozanis, Panayiotis
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 927 - 934
  • [5] Fuzzy Based Clustering Algorithms to Handle Big Data with Implementation on Apache Spark
    Bharill, Neha
    Tiwari, Aruna
    Malviya, Aayushi
    [J]. PROCEEDINGS 2016 IEEE SECOND INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2016), 2016, : 95 - 104
  • [6] Scalable Taxonomy Generation and Evolution on Apache Spark
    Aalijah, Kanwal
    Irfan, Rabia
    [J]. 2020 IEEE INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, INTL CONF ON CLOUD AND BIG DATA COMPUTING, INTL CONF ON CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2020, : 634 - 639
  • [7] Apache Spark-based scalable feature extraction approaches for protein sequence and their clustering performance analysis
    Jha, Preeti
    Tiwari, Aruna
    Bharill, Neha
    Ratnaparkhe, Milind
    Patel, Om Prakash
    Harshith, Nilagiri
    Mounika, Mukkamalla
    Nagendra, Neha
    [J]. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 15 (04) : 359 - 378
  • [8] Apache Spark-based scalable feature extraction approaches for protein sequence and their clustering performance analysis
    Preeti Jha
    Aruna Tiwari
    Neha Bharill
    Milind Ratnaparkhe
    Om Prakash Patel
    Nilagiri Harshith
    Mukkamalla Mounika
    Neha Nagendra
    [J]. International Journal of Data Science and Analytics, 2023, 15 : 359 - 378
  • [9] Scalable Manifold Learning for Big Data with Apache Spark
    Schoeneman, Frank
    Zola, Jaroslaw
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 272 - 281
  • [10] Is Apache Spark Scalable to Seismic Data Analytics and Computations?
    Yan, Yuzhong
    Huang, Lei
    Yi, Liqi
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2036 - 2045