Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

被引:95
|
作者
Ramirez-Gallego, Sergio [1 ]
Fernandez, Alberto [1 ]
Garcia, Salvador [1 ]
Chen, Min [2 ]
Herrera, Francisco [1 ]
机构
[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Hubei, Peoples R China
关键词
Big Data Analytics; MapReduce; Information fusion; Spark; Machine learning; BUSINESS INTELLIGENCE; SYSTEMS; INSIGHT;
D O I
10.1016/j.inffus.2017.10.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We live in a world were data are generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapse time, and to extract valuable knowledge from it. Therefore, the use of Big Data Analytics tools provide very significant advantages to both industry and academia. The MapReduce programming framework can be stressed as the main paradigm related with such tools. It is mainly identified by carrying out a distributed execution for the sake of providing a high degree of scalability, together with a fault tolerant scheme. In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.
引用
收藏
页码:51 / 61
页数:11
相关论文
共 50 条
  • [31] Hierarchical attribute reduction algorithms for big data using MapReduce
    Qian, Jin
    Lv, Ping
    Yue, Xiaodong
    Liu, Caihui
    Jing, Zhengjun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 73 : 18 - 31
  • [32] Parallel knowledge acquisition algorithms for big data using MapReduce
    Jin Qian
    Min Xia
    Xiaodong Yue
    [J]. International Journal of Machine Learning and Cybernetics, 2018, 9 : 1007 - 1021
  • [33] Parallel knowledge acquisition algorithms for big data using MapReduce
    Qian, Jin
    Xia, Min
    Yue, Xiaodong
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (06) : 1007 - 1021
  • [34] A MapReduce Cortical Algorithms Implementation for Unsupervised Learning of Big Data
    Hajj, Nadine
    Rizk, Yara
    Awad, Mariette
    [J]. INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 327 - 334
  • [35] Big Data fingerprinting information analytics for sustainability
    Kobusinska, Anna
    Pawluczuk, Kamil
    Brzezinski, Jerzy
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 86 : 1321 - 1337
  • [36] Online learning algorithms for big data analytics: A survey
    Li, Zhijie
    Li, Yuanxiang
    Wang, Feng
    He, Guoliang
    Kuang, Li
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (08): : 1707 - 1721
  • [37] Different Clustering Algorithms for Big Data Analytics: A Review
    Dave, Meenu
    Gianey, Hemant
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART-2016), 2016, : 328 - 333
  • [38] A SURVEY OF MACHINE LEARNING ALGORITHMS FOR BIG DATA ANALYTICS
    Athmaja, S.
    Hanumanthappa, M.
    Kavitha, Vasantha
    [J]. 2017 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2017,
  • [39] Big data analytics for retail industry using MapReduce-Apriori framework
    Verma, Neha
    Malhotra, Dheeraj
    Singh, Jatinder
    [J]. JOURNAL OF MANAGEMENT ANALYTICS, 2020, 7 (03) : 424 - 442
  • [40] Big Data Analytics for Industrial Process Control
    Khan, Abdul Rauf
    Schioler, Henrik
    Kulahci, Murat
    Knudsen, Torben
    [J]. 2017 22ND IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2017,