A Discretization Algorithm for Meteorological Data and its Parallelization Based on Hadoop

被引:2
|
作者
Liu, Chao [1 ]
Jin, Wen [1 ]
Yu, Yuting [1 ]
Qiu, Taorong [1 ]
Bai, Xiaoming [1 ]
Zou, Shuilong [1 ]
机构
[1] Nanchang Inst Sci & Technol, Sch Elect & Informat Engn, Nanchang, Jiangxi, Peoples R China
关键词
D O I
10.1088/1742-6596/910/1/012011
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In view of the large amount of meteorological observation data, the property is more and the attribute values are continuous values, the correlation between the elements is the need for the application of meteorological data, this paper is devoted to solving the problem of how to better discretize large meteorological data to more effectively dig out the hidden knowledge in meteorological data and research on the improvement of discretization algorithm for large scale data, in order to achieve data in the large meteorological data discretization for the follow-up to better provide knowledge to provide protection, a discretization algorithm based on information entropy and inconsistency of meteorological attributes is proposed and the algorithm is parallelized under Hadoop platform. Finally, the comparison test validates the effectiveness of the proposed algorithm for discretization in the area of meteorological large data.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Research on Data Mining Algorithm of Meteorological Observation Based on Data Quality Control Algorithm
    Ren Qing-Dao-Er-Ji
    Na Li
    Wireless Personal Communications, 2018, 102 : 2077 - 2089
  • [22] An Improved Data Discretization Algorithm based on Rough Sets Theory
    Liu, Han
    Jiang, Chunyu
    Wang, Miaoqiong
    Wei, Kai
    Yan, Shu
    2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 1432 - 1437
  • [23] A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters
    Ahmed, N.
    Barczak, Andre L. C.
    Rashid, Mohammad A.
    Susnjak, Teo
    JOURNAL OF BIG DATA, 2021, 8 (01)
  • [24] A parallelization model for performance characterization of Spark Big Data jobs on Hadoop clusters
    N. Ahmed
    Andre L. C. Barczak
    Mohammad A. Rashid
    Teo Susnjak
    Journal of Big Data, 8
  • [25] Discussion and Improvement of Apriori Algorithm of Data Mining Based on Hadoop Platform
    Zhao, Mengyang
    Tang, Bo
    Yang, Le
    PROCEEDINGS OF THE 2017 5TH INTERNATIONAL CONFERENCE ON FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY (FMSMT 2017), 2017, 130 : 183 - 187
  • [26] Hadoop-based ARIMA algorithm and its application in weather forecast
    Li, Leixiao
    Ma, Zhiqiang
    Liu, Limin
    Fan, Yuhong
    Li, L. (llxhappy@126.com), 1600, Science and Engineering Research Support Society, 20 Virginia Court, Sandy Bay, Tasmania, Australia (06): : 119 - 132
  • [27] Extraction Research about Parallelization of Named Entity Based on Hadoop Platform
    Shi, Quan
    Yang, Zhendong
    Xu, Lu
    ADVANCED DESIGN AND MANUFACTURING TECHNOLOGY III, PTS 1-4, 2013, 397-400 : 2309 - 2312
  • [28] Parallelization of an Algorithm for Automatic Classification of Medical Data
    Garcia-Molla, Victor M.
    Salazar, Addisson
    Safont, Gonzalo
    Vidal, Antonio M.
    Vergara, Luis
    COMPUTATIONAL SCIENCE - ICCS 2019, PT III, 2019, 11538 : 3 - 16
  • [29] METECLOUD: A PRIVATE CLOUD PLATFORM FOR METEOROLOGICAL DATA STORAGE USING HADOOP
    Xue Shengjun
    Xu Xiaolong
    Wang Delong
    Zhang Jie
    Ji Feng
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2013, 6 (02): : 648 - 663
  • [30] Parallelization of the Bison Algorithm Applied to Data Classification
    Ludwig, Simone A.
    Al-Sawwa, Jamil
    Misquith, Aaron Mackenzie
    ALGORITHMS, 2024, 17 (11)