Analysis of Massive Industrial Data using MapReduce Framework for Parallel Processing

被引:0
|
作者
Aly, Mohab [1 ]
Yacout, Soumaya [1 ]
Shaban, Yasser [2 ]
机构
[1] Ecole Polytech Montreal, Dept Ind Engn, CP 6079,Succ Ctr Ville, Montreal, PQ H3C 3A7, Canada
[2] Helwan Univ, Dept Mech Design Engn, POB 11718, Cairo, Egypt
关键词
Cloud Computing; Big Data; MapReduce; Parallel Processing; Data mining;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the emergence of the 'Big Data' paradigm, more and more industrial data are now available for practitioners and professionals. This data is being generated faster due to the advancement of the new information technologies. For reliability and maintenance engineers, 'Big Data' is an interesting source of information. If analyzed correctly, it can produce useful knowledge-base to help making decisions in an industrial organization. The availability of 'Big Data' is now leading to a new area of researches that are dedicated to the analysis of such data. This paper shows how to analyze massive amount of data generated from an industrial system(s). Those massive data may range from terabytes to petabytes in size; analyzing such sizes cannot be performed on a single commodity computer due to the possibility of memory leakage as the data may not fit into the computer's resources, specifically CPUs. Even if it fits, it will take an unacceptable amount of time. For this purpose, processing industrial large size of data requires the involvement of high performance analytical systems running on distributed environments. Different algorithms can be considered to have such analysis done. Cloud Computing models provide the necessary scalable and flexible infrastructure(s) to adapt the standard analytics algorithms in a distributed manner. We introduce a new distributed training technique that combines the newly widely used framework for big dataflow, namely MapReduce, with the traditional structure of machine learning techniques such as matrix multiplication and linear regression. Parallel processing of the aforementioned types is based on different algorithms to be adapted to MapReduce and its framework. Our considered platform is deployed on top of Google Cloud Platform (App Engine and Compute Engine), also taking into consideration Cloud Amazon EMR services to see how we can benefit from the provisioned resources in each one of them, and make the analysis and the extraction of useful information from the massive industrial data goes faster, i.e. in its computational time.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Parallel Processing of Massive EEG Data with MapReduce
    Wang, Lizhe
    Chen, Dan
    Ranjan, Rajiv
    Khan, Samee U.
    Kolodziej, Joanna
    Wang, Jun
    [J]. PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 164 - 171
  • [2] Parallel labeling of massive XML data with MapReduce
    Choi, Hyebong
    Lee, Kyong-Ha
    Lee, Yoon-Joon
    [J]. JOURNAL OF SUPERCOMPUTING, 2014, 67 (02): : 408 - 437
  • [3] Parallel labeling of massive XML data with MapReduce
    Hyebong Choi
    Kyong-Ha Lee
    Yoon-Joon Lee
    [J]. The Journal of Supercomputing, 2014, 67 : 408 - 437
  • [4] Parallel Map Matching on Massive Vehicle GPS Data Using MapReduce
    Huang, Jian
    Qiao, Shaoqing
    Yu, Haitao
    Qie, Jinhui
    Liu, Chunwei
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1498 - 1503
  • [5] Parallel Data Processing with MapReduce: A Survey
    Lee, Kyong-Ha
    Lee, Yoon-Joon
    Choi, Hyunsik
    Chung, Yon Dohn
    Moon, Bongki
    [J]. SIGMOD RECORD, 2011, 40 (04) : 11 - 20
  • [6] A Micropartitioning Technique for Massive Data Analysis Using MapReduce
    Mohanapriya, S.
    Natesan, P.
    [J]. 2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [7] Parallel Accessing Massive NetCDF Data Based on MapReduce
    Zhao, Hui
    Ai, SiYun
    Lv, ZhenHua
    Li, Bo
    [J]. WEB INFORMATION SYSTEMS AND MINING, 2010, 6318 : 425 - +
  • [8] Parallel similarity joins on massive high-dimensional data using MapReduce
    Ma, Youzhong
    Meng, Xiaofeng
    Wang, Shaoya
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (01): : 166 - 183
  • [9] A Multimedia Parallel Processing Approach on GPU MapReduce Framework
    Chen, Shih-Yeh
    Lai, Chin-Feng
    Hwang, Ren-Hung
    Chao, Han-Chieh
    Huang, Yueh-Min
    [J]. 2014 7TH INTERNATIONAL CONFERENCE ON UBI-MEDIA COMPUTING AND WORKSHOPS (UMEDIA), 2014, : 154 - 159
  • [10] Parallel Data Processing in Dynamic Hybrid Computing Environment Using MapReduce
    Tang, Bing
    He, Haiwu
    Fedak, Gilles
    [J]. ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2014, PT II, 2014, 8631 : 1 - 14