Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures

被引：12

作者：

Dos Anjos, Julio C. S. ^{[1
]}

Matteussi, Kassiano J. ^{[1
]}

De Souza, Paulo R. R., Jr. ^{[1
]}

Grabher, Gabriel J. A. ^{[1
]}

Borges, Guilherme A. ^{[1
]}

Barbosa, Jorge L. V. ^{[2
]}

Gonzalez, Gabriel V. ^{[3
]}

Leithardt, Valderi R. Q. ^{[4
,5
,6
]}

Geyer, Claudio F. R. ^{[1
]}

机构：

[1] Univ Fed Rio Grande do Sul, Inst Informat, UFRGS PPGC, BR-91501970 Porto Alegre, RS, Brazil

[2] Univ Vale Rio dos Sinos, UNISINOS, PPGCA, BR-93022750 Sao Leopoldo, Brazil

[3] Univ Salamanca, Expert Syst & Applicat Lab, Fac Sci, Salamanca 37008, Spain

[4] Inst Politecn Portalegre, VALORIZA Res Ctr, P-7300110 Portalegre, Portugal

[5] Univ Vale Itajai, Lab Embedded & Distributed Syst, BR-88302901 Itajai, SC, Brazil

[6] Univ Lusofona Humanidades & Tecnol, COPELABS, P-1700097 Lisbon, Portugal

来源：

IEEE ACCESS | 2020年 / 8卷 / 08期

关键词：

Cloud computing; Big Data; Computational modeling; Analytical models; Data models; Real-time systems; Big data analytics; cloud computing; hybrid infrastructures; MapReduce; volunteer computing; MAPREDUCE; CLOUD;

D O I：

10.1109/ACCESS.2020.3023344

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Big Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-scale computing for these areas. The main challenges are the user's unknowledge about Cloud infrastructure, the requirement needed for improving performance, and the resource management to maintain stable processing. In these difficulties, an inadequate solution can lead to users overestimate or underestimate the number of computational resources, which drives to the budget increases. One way to work around this problem is to make use of Volunteer Computing since it provides distributed computational resources at free monetary cost. However, a volatile machine behavior is a problem to address in Big Data data distributions. Thus, this work proposes a data distribution model composed of Cloud Computing and Volunteer Computing environments in a hybrid fashion for Big Data analytics. The contributions of this work are: i) the required evaluation to enable efficient deployment of Big Data in hybrid infrastructures; ii) the development of an HR_Alloc Algorithm for establishing the data placement to Big Data applications; iii) a model to resource allocation in hybrid infrastructures. The obtained results indicate the feasibility of using a hybrid infrastructure with up to 35% of unstable machines in the worst-case scenario, without losing performance and a monetary cost lower than 20% in comparison to Classical Cloud Computing. Also, communication costs decrease up to 57.14% in the best-case scenario due to load balancing.

引用

页码：170281 / 170294

页数：14

共 50 条

[1] Enabling Strategies for Big Data Analytics in Hybrid Infrastructures
Anjos, Julio C. S.
Matteussi, Kassiano J.
De Souza, Paulo R. R.
Geyer, Claudio F. R.
Veith, Alexandre S.
Fedak, Gilles
Victoria Barbosa, Jorge Luis
PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 869 - 876
[2] A Reference Architecture for Big Data Solutions Introducing a model to perform predictive analytics using big data technology
Geerdink, Bas
2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 66 - 71
[3] Managing Big Data through Hybrid Data Infrastructures
Candela, Leonardo
Castelli, Donatella
Pagano, Pasquale
ERCIM NEWS, 2012, (89): : 37 - 38
[4] Toward a Maturity Model for Big Data Analytics: A Roadmap for Complex Data Processing
Jami Pour, Mona
Abbasi, Fatemeh
Sohrabi, Babak
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2023, 22 (01) : 377 - 419
[5] Computing Platforms for Big Data Analytics in Electric Vehicle Infrastructures
Hussain, Md Muzakkir
Beg, M. M. Sufyan
Alam, Mohammad Saad
Krishnamurthy, Mahesh
Ali, Qazi Mazhar
2018 4TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2018), 2018, : 138 - 143
[6] An open compute and data federation as an alternative to monolithic infrastructures for big Earth data analytics
Backeberg, Bjorn
Sustr, Zdenek
Fernandez, Enol
Donchyts, Gennadii
Haag, Arjen
Oonk, J. B. Raymond
Venekamp, Gerben
Schumacher, Benjamin
Reimond, Stefan
Chatzikyriakou, Charis
BIG EARTH DATA, 2023, 7 (03) : 812 - 830
[7] Big Data Processing and Analytics for Process Industries
Sarnovsky, Martin
2018 IEEE 16TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2018): DEDICATED TO THE MEMORY OF PIONEER OF ROBOTICS ANTAL (TONY) K. BEJCZY, 2018, : 14 - 14
[8] Data Quality Alerting Model for Big Data Analytics
Gyulgyulyan, Eliza
Aligon, Julien
Ravat, Franck
Astsatryan, Hrachya
NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 1064 : 489 - 500
[9] A Reference Model for Big Data Analytics
Park, Eunjung
Sugumaran, Vijayan
Park, Sooyong
2018 9TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2018, : 382 - 391
[10] An Effective Model for Big Data Analytics
Bokhari, M. U.
Zeyauddin, Md.
Siddiqui, Md. Ashraf
PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 3980 - 3982

← 1 2 3 4 5 →