Big Data Performance Analysis on a Hadoop Distributed File System Based on Modified Partitional Clustering Algorithm

被引:0
|
作者
Marichamy, V. Santhana [1 ]
Natarajan, V [2 ]
机构
[1] SRM Valliammai Engn Coll, Dept Comp Applicat, Chennai 603203, Tamil Nadu, India
[2] Anna Univ, Dept Instrumentat Engn, MIT Campus, Chennai 600044, Tamil Nadu, India
关键词
HDFS; PCA; VCVI; K-means cluster; Density parameter; Clustering time; Recall;
D O I
10.1007/978-3-030-34515-0_48
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a Big Data Performance Analysis based on a modified Partitional Clustering Algorithm (PCA) on a Hadoop Distributed File System (HDFS) which is commonly used in various business applications. This paper has utilized an improved K-means clustering algorithm, which selects the initial clustering centers based on the density parameters. After calculating the density parameter, the data with largest density parameter is selected as the first initial clustering center point, all the left data in the field is deleted from the dataset. By repeating the above phases, K initial clustering centers are found. A new method to improve the precision and packing effect of the K-means computation is needed as there is a poor assurance of finding an initial centers. The proposed approach does not select the initial clustering algorithm randomly, so the stable K value can be obtained by calculating Variance based Cluster Validity Index (VCVI). The performance of the proposed method is evaluated with the parameters Precision, Clustering time and Recall. The experimental result shows that the proposed approach reduces the complexity along with various parameters are compared with existing methods.
引用
收藏
页码:461 / 468
页数:8
相关论文
共 50 条
  • [1] Hadoop Distributed File System for Big data analysis
    Almansouri, Hatim Talal
    Masmoudi, Youssef
    [J]. PROCEEDINGS OF 2019 IEEE 4TH WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS' 19), 2019, : 257 - 261
  • [2] Big Data Performance Analysis on a Hadoop Distributed File System Based on Geometric Data Perturbation Technique
    Marichamy, V. Santhana
    Natarajan, V.
    [J]. 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 415 - 420
  • [3] An approach for Big Data Security based on Hadoop Distributed File system
    Mahmoud, Hadeer
    Hegazy, Abdelfatah
    Khafagy, Mohamed H.
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMPUTER ENGINEERING (ITCE' 2018), 2018, : 109 - 114
  • [4] Attribute based honey encryption algorithm for securing big data: Hadoop distributed file system perspective
    Kapil, Gayatri
    Agrawal, Alka
    Attaallah, Abdulaziz
    Algarni, Abdullah
    Kumar, Rajeev
    Khan, Raees Ahmad
    [J]. PEERJ COMPUTER SCIENCE, 2020, 2020 (02) : 1 - 31
  • [5] Performance Analysis of Hadoop Distributed File System Writing File Process
    Xie, Yunyue
    Farhan, Abobaker Mohammed Qasem
    Zhou, Meihua
    [J]. 2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AUTONOMOUS SYSTEMS (ICOIAS), 2018, : 116 - 120
  • [6] High Performance and Fault Tolerant Distributed File System for Big Data Storage and Processing using Hadoop
    Sivaraman, E.
    Manickachezian, R.
    [J]. 2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 32 - 36
  • [7] Computer Performance Determination System Based on Big Data Distributed File
    Lu, Kong
    [J]. CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 877 - 884
  • [8] Complete Data Deletion Based on Hadoop Distributed File System
    Wang, Fulin
    Wu, Shunxiang
    Cai, Jianhuai
    Zhao, Longze
    Liao, Zhendong
    Ming, Daodong
    [J]. PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2019), 2019,
  • [9] An Efficient Data Duplication System based on Hadoop Distributed File System
    Veeraiah, D.
    Rao, J. Nageswara
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 197 - 200
  • [10] Data Security in Hadoop Distributed File System
    Shetty, Madhvaraj M.
    Manjaiah, D. H.
    [J]. IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGICAL TRENDS IN COMPUTING, COMMUNICATIONS AND ELECTRICAL ENGINEERING (ICETT), 2016,