Big Data: Tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce

被引：95

作者：

Ramirez-Gallego, Sergio ^{[1
]}

Fernandez, Alberto ^{[1
]}

Garcia, Salvador ^{[1
]}

Chen, Min ^{[2
]}

Herrera, Francisco ^{[1
]}

机构：

[1] Univ Granada, Dept Comp Sci & Artificial Intelligence, Granada, Spain

[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Hubei, Peoples R China

来源：

INFORMATION FUSION | 2018年 / 42卷

关键词：

Big Data Analytics; MapReduce; Information fusion; Spark; Machine learning; BUSINESS INTELLIGENCE; SYSTEMS; INSIGHT;

D O I：

10.1016/j.inffus.2017.10.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We live in a world were data are generated from a myriad of sources, and it is really cheap to collect and storage such data. However, the real benefit is not related to the data itself, but with the algorithms that are capable of processing such data in a tolerable elapse time, and to extract valuable knowledge from it. Therefore, the use of Big Data Analytics tools provide very significant advantages to both industry and academia. The MapReduce programming framework can be stressed as the main paradigm related with such tools. It is mainly identified by carrying out a distributed execution for the sake of providing a high degree of scalability, together with a fault tolerant scheme. In every MapReduce algorithm, first local models are learned with a subset of the original data within the so-called Map tasks. Then, the Reduce task is devoted to fuse the partial outputs generated by each Map. The ways of designing such fusion of information/models may have a strong impact in the quality of the final system. In this work, we will enumerate and analyze two alternative methodologies that may be found both in the specialized literature and in standard Machine Learning libraries for Big Data. Our main objective is to provide an introduction of the characteristics of these methodologies, as well as giving some guidelines for the design of novel algorithms in this field of research. Finally, a short experimental study will allow us to contrast the scalability issues for each type of process fusion in MapReduce for Big Data Analytics.

引用

页码：51 / 61

页数：11

共 50 条

[21] Tensor Completion Algorithms in Big Data Analytics
Song, Qingquan
Ge, Hancheng
Caverlee, James
Hu, Xia
[J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2019, 13 (01)
[22] Biological Big Data Analytics: Challenges and Algorithms
Rajasekaran, Sanguthevar
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 1 - 1
[23] Process Data Analytics in the Era of Big Data
Qin, S. Joe
[J]. AICHE JOURNAL, 2014, 60 (09) : 3092 - 3100
[24] A Hadoop/MapReduce based platform for supporting health big data analytics
Kuo, Alex
Chrimes, Dillon
Qin, Pinle
Zamani, Hamid
[J]. Studies in Health Technology and Informatics, 2019, 257 : 229 - 235
[25] AMPO: Algorithm for MapReduce Performance Optimization for Enhancing Big Data Analytics
Yambem, Nandita
Nandakumar, A. N.
[J]. 2017 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER, AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2017, : 717 - 723
[26] Enabling Big Data Analytics in the Hybrid Cloud using Iterative MapReduce
Clemente-Castello, Francisco J.
Nicolae, Bogdan
Katrinis, Kostas
Rafique, M. Mustafa
Mayo, Rafael
Carlos Fernandez, Juan
Loreti, Daniela
[J]. 2015 IEEE/ACM 8TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2015, : 290 - 299
[27] An Enhanced Memetic Algorithm for Feature Selection in Big Data Analytics with MapReduce
Ramakrishnan, Umanesan
Nachimuthu, Nandhagopal
[J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 31 (03): : 1547 - 1559
[28] Unsupervised Named Entity Normalization for Supporting Information Fusion for Big Bridge Data Analytics
Liu, Kaijian
El-Gohary, Nora
[J]. ADVANCED COMPUTING STRATEGIES FOR ENGINEERING, PT II, 2018, 10864 : 130 - 149
[29] Advanced Machine Learning and Statistical Inference Approaches for Big Data Analytics and Information Fusion
Mehra, Raman K.
Gandhe, Avinash
Mansinghka, Vikash
Shafto, Patrick
Lovell, Dan
Yu, Ssu-Hsin
[J]. SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745
[30] Advanced Machine Learning & Statistical Inference Approaches for Big Data Analytics and Information Fusion
Mehra, Raman K.
Gandhe, Avinash
Mansinghka, Vikash
Shafto, Patrick
Lovell, Dan
Yu, Ssu-Hsin
[J]. SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XXII, 2013, 8745

← 1 2 3 4 5 →