A Survey and Recommendations for Distributed, Parallel, Single Pass, Incremental Bayesian Classification based on MapReduce for Big Data

被引:0
|
作者
Shafiq, M. Omair [1 ]
Yang, Yibing [1 ]
Fekri, Maryam [1 ]
机构
[1] Carleton Univ, Sch Informat Technol, Ottawa, ON, Canada
关键词
Distributed; Parallel; Single-pass; Incremental; Bayesian; Classification; Big Data;
D O I
10.1109/HPCCWS.2017.00013
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In the emerging digital age, massive production of data is occurred actively or passively by collecting data from users and environment via applications, sensor devices and so on. That makes it important and crucial to have the ability to process big data efficiently and effectively utilize it. The challenge to process big data is that it has high volume, velocity, variety, as well as veracity and value. In this paper, we present a survey of related work and prescribe our recommendations towards building Bayesian classification for big data environments. It is based on MapReduce and is distributed, parallel, single pass and incremental which makes it feasible to be deployed and executed on cloud computing platform We also carry out scalability analysis of the proposed solution that it can train Bayesian classifier to perform predictive analytics by processing big data with large volume, velocity and variety.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 50 条
  • [21] K-Means Parallel Algorithm of Big Data Clustering Based on Mapreduce PCAM Method
    Li, Yongyi
    Yang, Zhongqiang
    Han, Kaixu
    Engineering Intelligent Systems, 2021, 29 (06): : 411 - 418
  • [22] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [23] A review on big data based parallel and distributed approaches of pattern mining
    Kumar, Sunil
    Mohbey, Krishna Kumar
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (05) : 1639 - 1662
  • [24] Distributed Big Data Clustering using MapReduce-based Fuzzy C-Medoids
    Sardar T.H.
    Ansari Z.
    Journal of The Institution of Engineers (India): Series B, 2022, 103 (01) : 73 - 82
  • [25] Resource and Cost Aware Glowworm Mapreduce Optimization Based Big Data Processing in Geo Distributed Data Center
    Nithyanantham, S.
    Singaravel, G.
    WIRELESS PERSONAL COMMUNICATIONS, 2021, 117 (04) : 2831 - 2852
  • [26] Resource and Cost Aware Glowworm Mapreduce Optimization Based Big Data Processing in Geo Distributed Data Center
    S. Nithyanantham
    G. Singaravel
    Wireless Personal Communications, 2021, 117 : 2831 - 2852
  • [27] A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules
    Sara del Río
    Victoria López
    José Manuel Benítez
    Francisco Herrera
    International Journal of Computational Intelligence Systems, 2015, 8 : 422 - 437
  • [28] Improved KD-tree based imbalanced big data classification and oversampling for MapReduce platforms
    Sleeman, William C.
    Roseberry, Martha
    Ghosh, Preetam
    Cano, Alberto
    Krawczyk, Bartosz
    APPLIED INTELLIGENCE, 2024, 54 (23) : 12558 - 12575
  • [29] Big data classification using heterogeneous ensemble classifiers in Apache Spark based on MapReduce paradigm
    Kadkhodaei, Hamidreza
    Moghadam, Amir Masoud Eftekhari
    Dehghan, Mehdi
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183
  • [30] A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules
    del Rio, Sara
    Lopez, Victoria
    Manuel Benitez, Jose
    Herrera, Francisco
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2015, 8 (03) : 422 - 437