Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures

被引:12
|
作者
Dos Anjos, Julio C. S. [1 ]
Matteussi, Kassiano J. [1 ]
De Souza, Paulo R. R., Jr. [1 ]
Grabher, Gabriel J. A. [1 ]
Borges, Guilherme A. [1 ]
Barbosa, Jorge L. V. [2 ]
Gonzalez, Gabriel V. [3 ]
Leithardt, Valderi R. Q. [4 ,5 ,6 ]
Geyer, Claudio F. R. [1 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, UFRGS PPGC, BR-91501970 Porto Alegre, RS, Brazil
[2] Univ Vale Rio dos Sinos, UNISINOS, PPGCA, BR-93022750 Sao Leopoldo, Brazil
[3] Univ Salamanca, Expert Syst & Applicat Lab, Fac Sci, Salamanca 37008, Spain
[4] Inst Politecn Portalegre, VALORIZA Res Ctr, P-7300110 Portalegre, Portugal
[5] Univ Vale Itajai, Lab Embedded & Distributed Syst, BR-88302901 Itajai, SC, Brazil
[6] Univ Lusofona Humanidades & Tecnol, COPELABS, P-1700097 Lisbon, Portugal
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
关键词
Cloud computing; Big Data; Computational modeling; Analytical models; Data models; Real-time systems; Big data analytics; cloud computing; hybrid infrastructures; MapReduce; volunteer computing; MAPREDUCE; CLOUD;
D O I
10.1109/ACCESS.2020.3023344
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big Data applications are present in many areas such as financial markets, search engines, stream services, health care, social networks, and so on. Data analysis provides value to information for organizations. Classical Cloud Computing represents a robust architecture to perform complex and large-scale computing for these areas. The main challenges are the user's unknowledge about Cloud infrastructure, the requirement needed for improving performance, and the resource management to maintain stable processing. In these difficulties, an inadequate solution can lead to users overestimate or underestimate the number of computational resources, which drives to the budget increases. One way to work around this problem is to make use of Volunteer Computing since it provides distributed computational resources at free monetary cost. However, a volatile machine behavior is a problem to address in Big Data data distributions. Thus, this work proposes a data distribution model composed of Cloud Computing and Volunteer Computing environments in a hybrid fashion for Big Data analytics. The contributions of this work are: i) the required evaluation to enable efficient deployment of Big Data in hybrid infrastructures; ii) the development of an HR_Alloc Algorithm for establishing the data placement to Big Data applications; iii) a model to resource allocation in hybrid infrastructures. The obtained results indicate the feasibility of using a hybrid infrastructure with up to 35% of unstable machines in the worst-case scenario, without losing performance and a monetary cost lower than 20% in comparison to Classical Cloud Computing. Also, communication costs decrease up to 57.14% in the best-case scenario due to load balancing.
引用
收藏
页码:170281 / 170294
页数:14
相关论文
共 50 条
  • [1] Enabling Strategies for Big Data Analytics in Hybrid Infrastructures
    Anjos, Julio C. S.
    Matteussi, Kassiano J.
    De Souza, Paulo R. R.
    Geyer, Claudio F. R.
    Veith, Alexandre S.
    Fedak, Gilles
    Victoria Barbosa, Jorge Luis
    PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 869 - 876
  • [2] A Reference Architecture for Big Data Solutions Introducing a model to perform predictive analytics using big data technology
    Geerdink, Bas
    2013 8TH INTERNATIONAL CONFERENCE FOR INTERNET TECHNOLOGY AND SECURED TRANSACTIONS (ICITST), 2013, : 66 - 71
  • [3] Managing Big Data through Hybrid Data Infrastructures
    Candela, Leonardo
    Castelli, Donatella
    Pagano, Pasquale
    ERCIM NEWS, 2012, (89): : 37 - 38
  • [4] Toward a Maturity Model for Big Data Analytics: A Roadmap for Complex Data Processing
    Jami Pour, Mona
    Abbasi, Fatemeh
    Sohrabi, Babak
    INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2023, 22 (01) : 377 - 419
  • [5] Computing Platforms for Big Data Analytics in Electric Vehicle Infrastructures
    Hussain, Md Muzakkir
    Beg, M. M. Sufyan
    Alam, Mohammad Saad
    Krishnamurthy, Mahesh
    Ali, Qazi Mazhar
    2018 4TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2018), 2018, : 138 - 143
  • [6] An open compute and data federation as an alternative to monolithic infrastructures for big Earth data analytics
    Backeberg, Bjorn
    Sustr, Zdenek
    Fernandez, Enol
    Donchyts, Gennadii
    Haag, Arjen
    Oonk, J. B. Raymond
    Venekamp, Gerben
    Schumacher, Benjamin
    Reimond, Stefan
    Chatzikyriakou, Charis
    BIG EARTH DATA, 2023, 7 (03) : 812 - 830
  • [7] Big Data Processing and Analytics for Process Industries
    Sarnovsky, Martin
    2018 IEEE 16TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2018): DEDICATED TO THE MEMORY OF PIONEER OF ROBOTICS ANTAL (TONY) K. BEJCZY, 2018, : 14 - 14
  • [8] Data Quality Alerting Model for Big Data Analytics
    Gyulgyulyan, Eliza
    Aligon, Julien
    Ravat, Franck
    Astsatryan, Hrachya
    NEW TRENDS IN DATABASES AND INFORMATION SYSTEMS, ADBIS 2019, 2019, 1064 : 489 - 500
  • [9] A Reference Model for Big Data Analytics
    Park, Eunjung
    Sugumaran, Vijayan
    Park, Sooyong
    2018 9TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2018, : 382 - 391
  • [10] An Effective Model for Big Data Analytics
    Bokhari, M. U.
    Zeyauddin, Md.
    Siddiqui, Md. Ashraf
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 3980 - 3982