An Efficient Distributed Database Clustering Algorithm for Big Data Processing

被引:0
|
作者
Sun, Qiao [1 ]
Fu, Lan-mei [1 ]
Deng, Bu-qiao [1 ]
Pei, Xu-bin [2 ]
Sun, Jia-song [3 ]
机构
[1] Beijing GuoDianTong Network Technol Co Ltd, Beijing, Peoples R China
[2] State Grid Zhejiang Elect Power Co Ltd, Hangzhou, Zhejiang, Peoples R China
[3] Tsinghua Univ, EE Dept, Beijing, Peoples R China
关键词
Distributed big data processing; Distributed database; Data clustering; Depth neural network; K-means;
D O I
10.23977/iccsc.2017.1012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper proposes a distributed data clustering technique based on deep neural network. First, each record in the distributed database is taken as an input vector, and its characteristics are extracted and input to the input layer of the depth neural network. The weight of the connection is trained by BP algorithm, and the training of depth neural network output is realized by adjusting the weight. Finally, the data clustering results are judged according to the similarity of the current vector corresponding to the output data. Experimental results based on small-scale distributed systems show that this method has better test set accuracy than traditional k-means clustering method, and is more suitable for large-scale data clustering in the distributed environments.
引用
下载
收藏
页码:70 / 74
页数:5
相关论文
共 50 条
  • [41] Distributed Database and Application Architecture for Big Data Solutions
    Misaki, Makoto
    Tsuda, Tomio
    Inoue, Shinji
    Sato, Shintaro
    Kayahara, Akihiro
    Imai, Shin-ichi
    INTERNATIONAL SYMPOSIUM ON SEMICONDUCTOR MANUFACTURING (ISSM) 2016 PROCEEDINGS OF TECHNICAL PAPERS, 2016,
  • [42] Performance Enhancement of Distributed Clustering for Big Data Analytics
    Mohamed, Omar Hesham
    Shehab, Mohamed Elemam
    El Fakharany, Essam
    INTERNATIONAL CONFERENCE ON ADVANCED MACHINE LEARNING TECHNOLOGIES AND APPLICATIONS (AMLTA2018), 2018, 723 : 415 - 425
  • [43] Heterogeneous Distributed Big Data Clustering on Sparse Grids
    Pfander, David
    Daiss, Gregor
    Pflueger, Dirk
    ALGORITHMS, 2019, 12 (03)
  • [44] Intelligent Distributed Processing Methods for Big Data
    Jung, Jason J.
    Camacho, David
    Badica, Costin
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2015, 21 (06) : 754 - 756
  • [45] A Distributed Weighted Possibilistic c-Means Algorithm for Clustering Incomplete Big Sensor Data
    Zhang, Qingchen
    Chen, Zhikui
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2014,
  • [46] Efficient Utilization of Big Data using Distributed Storage, Parallel Processing, and Blockchain Technology
    Giuliano, Alessandro
    Hilal, Waleed
    Alsadi, Naseem
    Surucu, Onur
    Gadsden, S. Andrew
    Yawney, John
    Ziada, Youssef
    BIG DATA IV: LEARNING, ANALYTICS, AND APPLICATIONS, 2022, 12097
  • [47] Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems
    Jennath, H. S.
    Asharaf, S.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (09) : 3803 - 3827
  • [48] An efficient parallel clustering algorithm for large scale database
    School of Electronic Information, Wuhan University, Wuhan, Hubei, China
    不详
    不详
    J. Softw., 2009, 10 (1119-1126):
  • [49] AN EFFICIENT DATA LOCALITY SYSTEM FOR BIG DATA PROCESSING OVER DISTRIBUTED DATA CENTRE BY USING SCHEDULING TECHNIQUE
    Kaur, Navneet
    Gupta, S. C.
    Kumar, Deepak
    ADVANCES AND APPLICATIONS IN MATHEMATICAL SCIENCES, 2019, 18 (08): : 651 - 664
  • [50] Map-Matching on Big Data: a Distributed and Efficient Algorithm with a Hidden Markov Model
    Francia, Matteo
    Gallinucci, Enrico
    Vitali, Federico
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1238 - 1243