High Performance Big Data Clustering

被引:4
|
作者
Agrawal, Ankit [1 ]
Patwary, Md. Mostofa Ali [1 ]
Hendrix, William [1 ]
Liao, Wei-keng [1 ]
Choudhary, Alok [1 ]
机构
[1] Northwestern Univ, Dept EECS, Evanston, IL 60208 USA
来源
关键词
big data; clustering; density-based clustering; hierarchical clustering; DBSCAN ALGORITHM; PARALLEL;
D O I
10.3233/978-1-61499-322-3-192
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific advances are collectively exploding the amount, diversity, and complexity of data becoming available. Our ability to collect huge amounts of data has greatly surpassed our analytical capacity to make sense of it. Efficient use of high performance computing techniques is critical for the success of the data-driven paradigm to scientific discovery. Data clustering is one of the fundamental analytics tasks heavily relied upon in many application domains, like astrohpysics, climate science, bioinformatics, etc. In this book chapter, we illustrate the challenges and opportunities in mining big data using two recently developed scalable parallel clustering algorithms. Experimental results on millions of high-dimensional data points clustered in parallel on thousands of processor cores are also presented.
引用
收藏
页码:192 / 211
页数:20
相关论文
共 50 条
  • [1] Performance Enhancement of Distributed Clustering for Big Data Analytics
    Mohamed, Omar Hesham
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    [J]. INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 415 - 425
  • [2] High-Performance Computing based Scalable Online Fuzzy Clustering Algorithms for Big Data
    Jha, Preeti
    Tiwari, Aruna
    Bharill, Neha
    Ratnaparkhe, Milind
    Patel, Om Prakash
    Pulakitha, Rapolu
    Chauhan, Aditi
    [J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 1400 - 1407
  • [3] A High-Order CFS Algorithm for Clustering Big Data
    Bu, Fanyu
    Chen, Zhikui
    Li, Peng
    Tang, Tong
    Zhang, Ying
    [J]. MOBILE INFORMATION SYSTEMS, 2016, 2016
  • [4] A Novel Intelligent Clustering Approach for High Dimensional Data in a Big Data Environment
    Tao, Qian
    Wang, Zhenyu
    Gu, Chunqin
    Chen, Wenyuan
    Lin, Weiqiang
    Lin, Haojie
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017,
  • [5] Big Data Clustering: A Review
    Shirkhorshidi, Ali Seyed
    Aghabozorgi, Saeed
    Teh, Ying Wah
    Herawan, Tutut
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT V, 2014, 8583 : 707 - 720
  • [6] MapReduce Clustering for Big Data
    Ghattas, Badih
    Pinto, Antoine
    Diao, Sambou
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5116 - 5124
  • [7] Consensus Clustering on Big Data
    Liu, Hongfu
    Cheng, Gong
    Wu, Junjie
    [J]. 2015 12TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM), 2015,
  • [8] Big Data and Clustering Algorithms
    Ajin, V. W.
    Kumar, Lekshmy D.
    [J]. 2016 INTERNATIONAL CONFERENCE ON RESEARCH ADVANCES IN INTEGRATED NAVIGATION SYSTEMS (RAINS), 2016,
  • [9] Strategies for Big Data Clustering
    Kurasova, Olga
    Marcinkevicius, Virginijus
    Medvedev, Viktor
    Rapecka, Aurimas
    Stefanovic, Pavel
    [J]. 2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 740 - 747
  • [10] Big Data clustering validity
    Tlili, Monia
    Hamdani, Tarek M.
    [J]. 2014 6TH INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2014, : 348 - 352