The Application of Spark-Based Gaussian Mixture Model for Farm Environmental Data Analysis

被引:2
|
作者
Pang, Honglin [1 ,2 ]
Deng, Li [1 ,2 ]
Wang, Ling [1 ,2 ]
Fei, Minrui [1 ,2 ]
机构
[1] Shanghai Univ, Sch Mechatron Engn & Automat, Shanghai 200072, Peoples R China
[2] Shanghai Key Lab Power Stn Automat Technol, Shanghai 200072, Peoples R China
基金
中国国家自然科学基金;
关键词
Gaussian Mixture Model; Dirichlet Process; Gibbs sampling; Spark; Anomaly detection; IDENTIFICATION; SIGNALS;
D O I
10.1007/978-981-10-2669-0_18
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
For fully taking into account the feature of environmental data set the Gaussian mixture model (GMM) is combined with the Dirichlet Process (DP) to solve the problem of specifying the initial cluster number. The Gibbs sampling algorithm is also used as the substitute of the Expectation Maximization algorithm to estimate the parameter of the model with Dirichlet Process. The clustering process is implemented under the framework of Spark so as to deal with farm environmental data set stored in distributed computer cluster. Experiment results with external criterion show that the improved clustering method has a better ability in data anomaly detection compared with other common cluster methods. Farm environmental data anomaly detection is implemented by the improved clustering method.
引用
收藏
页码:164 / 173
页数:10
相关论文
共 50 条
  • [1] Spark-based ensemble learning for imbalanced data classification
    Ding, Jiaman
    Wang, Sichen
    Jia, Lianyin
    You, Jinguo
    Jiang, Ying
    [J]. International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964
  • [2] Improve Spark-based Application Performance Using Minimizer
    Wu, Jinda
    Deng, Li
    Wang, Lili
    Li, Kexue
    Lu, Yakang
    Song, Yang
    [J]. PROCEEDINGS OF 2020 IEEE 9TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS'20), 2020, : 595 - 599
  • [3] Spark-Based Log Data Analysis for Reconstruction of Cybercrime Events in Cloud Environment
    Hemdan, Ezz El-Din
    Manjaiah, D. H.
    [J]. PROCEEDINGS OF 2017 IEEE INTERNATIONAL CONFERENCE ON CIRCUIT ,POWER AND COMPUTING TECHNOLOGIES (ICCPCT), 2017,
  • [4] A Practical Roadmap for Provenance Capture and Data Analysis in Spark-based Scientific Workflows
    Guedes, Thaylon
    Silva, Vitor
    Mattoso, Marta
    Bedo, Marcos V. N.
    de Oliveira, Daniel
    [J]. PROCEEDINGS OF WORKS 2018: 13TH IEEE/ACM WORKSHOP ON WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS), 2018, : 31 - 41
  • [5] Efficient Spark-Based Framework for Big Geospatial Data Query Processing and Analysis
    Aljawarneh, Isam Mashhour
    Bellavista, Paolo
    Corradi, Antonio
    Montanari, Rebecca
    Foschini, Luca
    Zanotti, Andrea
    [J]. 2017 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2017, : 851 - 856
  • [6] Lemonade: A scalable and efficient Spark-based platform for data analytics
    dos Santos, Walter
    Carvalho, Luiz F. M.
    Avelar, Gustavo de P.
    Silva, Atila, Jr.
    [J]. 2017 17TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2017, : 745 - 748
  • [7] Spark-based data analytics of sequence motifs in large omics data
    Sarumi, Oluwafemi A.
    Leung, Carson K.
    Adetunmbi, Adebayo O.
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 : 596 - 605
  • [8] SparkFlow: Towards High-Performance Data Analytics for Spark-based Genome Analysis
    Filgueira, Rosa
    Awaysheh, Feras M.
    Carter, Adam
    White, Darren J.
    Rana, Omer
    [J]. 2022 22ND IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2022), 2022, : 1007 - 1016
  • [9] The Spark-based framework for mobile network data and cluster analysis on mobile users' behaviors
    Liu Haoxi
    Dong Min
    Tang Xue
    Bi Sheng
    Cao Dan
    Qiu Rongcai
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2015, : 487 - 492
  • [10] Spark-Based Iterative Spatial Overlay Analysis Method
    Zhao, Zheng
    Chen, Luo
    Wu, Ye
    Jing, Ning
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ELECTRONIC INDUSTRY AND AUTOMATION (EIA 2017), 2017, 145 : 227 - 232