TomusBlobs: scalable data-intensive processing on Azure clouds

被引:3
|
作者
Costan, Alexandru [1 ]
Tudoran, Radu [1 ]
Antoniu, Gabriel [1 ]
Brasche, Goetz [2 ]
机构
[1] Inria Rennes Bretagne Atlant, Campus Beaulieu, F-35042 Rennes, France
[2] EMIC, Microsoft Adv Technol Labs Europe, Ritterstr 23, D-52072 Aachen, Germany
来源
关键词
big data; cloud computing; data-intensive processing; cloud storage; MapReduce; scientific applications; Azure;
D O I
10.1002/cpe.3034
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The emergence of cloud computing has brought the opportunity to use large-scale compute infrastructures for a broader and broader spectrum of applications and users. As the cloud paradigm gets attractive for the elasticity' in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. As Big Data analysis on clouds becomes more and more relevant in many application areas, enabling high-throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper, we address this challenge at the level of cloud storage. We introduce a concurrency-optimized data storage system (called TomusBlobs), which federates the virtual disks associated to the Virtual Machines running the application code on the cloud. We demonstrate the performance benefits of our solution for efficient data-intensive processing by building an optimized prototype MapReduce framework for Microsoft's Azure cloud platform on the basis of TomusBlobs. Finally, we specifically address the limitations of state-of-the-art MapReduce frameworks for reduce-intensive workloads, by proposing MapIterativeReduce as an extension of the MapReduce model. We validate the aforementioned contributions through large-scale experiments with synthetic benchmarks and with real-world applications on the Azure commercial cloud by using resources distributed across multiple data centers; they demonstrate that our solutions bring substantial benefits to data-intensive applications compared with approaches relying on state-of-the-art cloud object storage. Copyright (c) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:950 / 976
页数:27
相关论文
共 50 条
  • [31] Service Placement and Request Scheduling for Data-intensive Applications in Edge Clouds
    Farhadi, Vajiheh
    Mehmeti, Fidan
    He, Ting
    La Porta, Tom
    Khamfroush, Hana
    Wang, Shiqiang
    Chan, Kevin S.
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2019), 2019, : 1279 - 1287
  • [32] Service Placement and Request Scheduling for Data-Intensive Applications in Edge Clouds
    Farhadi, Vajiheh
    Mehmeti, Fidan
    He, Ting
    La Porta, Thomas F.
    Khamfroush, Hana
    Wang, Shiqiang
    Chan, Kevin S.
    Poularakis, Konstantinos
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2021, 29 (02) : 779 - 792
  • [33] Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds
    Qu, Xilong
    Xiao, Peng
    Huang, Lirong
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (07): : 2935 - 2955
  • [34] Towards Scheduling Data-Intensive and Privacy-Aware Workflows in Clouds
    Wen, Yiping
    Dou, Wanchun
    Cao, Buqing
    Chen, Congyang
    COLLABORATE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2016, 2017, 201 : 474 - 479
  • [35] A scalable Cloud-based system for data-intensive spatial analysis
    R. O. Sinnott
    W. Voorsluys
    International Journal on Software Tools for Technology Transfer, 2016, 18 : 587 - 605
  • [36] SCADIS: A Scalable Accelerator for Data-Intensive String Set Matching on FPGAs
    Lei, Shiming
    Wang, Chao
    Fang, Haijie
    Li, Xi
    Zhou, Xuehai
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1190 - 1197
  • [37] Scalable Pointer-based Memory Protection for Data-intensive Computing
    An, Baik Song
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 1602 - 1604
  • [38] A scalable Cloud-based system for data-intensive spatial analysis
    Sinnott, R. O.
    Voorsluys, W.
    INTERNATIONAL JOURNAL ON SOFTWARE TOOLS FOR TECHNOLOGY TRANSFER, 2016, 18 (06) : 587 - 605
  • [39] Hypergraph-Based Data Reduced Scheduling Policy for Data-Intensive Workflow in Clouds
    Hu, Zhigang
    Li, Jia
    Zheng, Meiguang
    Zhang, Xinxin
    Kang, Hui
    Tao, Yong
    Yang, Jiao
    DATA SCIENCE, PT II, 2017, 728 : 335 - 349
  • [40] A novel cloud model based data placement strategy for data-intensive application in clouds
    Zhang, Xinxin
    Hu, Zhigang
    Zheng, Meiguang
    Li, Jia
    Yang, Liu
    COMPUTERS & ELECTRICAL ENGINEERING, 2019, 77 : 445 - 456