TomusBlobs: scalable data-intensive processing on Azure clouds

被引:3
|
作者
Costan, Alexandru [1 ]
Tudoran, Radu [1 ]
Antoniu, Gabriel [1 ]
Brasche, Goetz [2 ]
机构
[1] Inria Rennes Bretagne Atlant, Campus Beaulieu, F-35042 Rennes, France
[2] EMIC, Microsoft Adv Technol Labs Europe, Ritterstr 23, D-52072 Aachen, Germany
来源
关键词
big data; cloud computing; data-intensive processing; cloud storage; MapReduce; scientific applications; Azure;
D O I
10.1002/cpe.3034
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The emergence of cloud computing has brought the opportunity to use large-scale compute infrastructures for a broader and broader spectrum of applications and users. As the cloud paradigm gets attractive for the elasticity' in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. As Big Data analysis on clouds becomes more and more relevant in many application areas, enabling high-throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper, we address this challenge at the level of cloud storage. We introduce a concurrency-optimized data storage system (called TomusBlobs), which federates the virtual disks associated to the Virtual Machines running the application code on the cloud. We demonstrate the performance benefits of our solution for efficient data-intensive processing by building an optimized prototype MapReduce framework for Microsoft's Azure cloud platform on the basis of TomusBlobs. Finally, we specifically address the limitations of state-of-the-art MapReduce frameworks for reduce-intensive workloads, by proposing MapIterativeReduce as an extension of the MapReduce model. We validate the aforementioned contributions through large-scale experiments with synthetic benchmarks and with real-world applications on the Azure commercial cloud by using resources distributed across multiple data centers; they demonstrate that our solutions bring substantial benefits to data-intensive applications compared with approaches relying on state-of-the-art cloud object storage. Copyright (c) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:950 / 976
页数:27
相关论文
共 50 条
  • [21] Scalable Programming and Algorithms for Data-Intensive Life Science Applications
    Qiu, Judy
    OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2011, 15 (04) : 235 - 237
  • [22] Rethinking Data-Intensive Science Using Scalable Analytics Systems
    Nothaft, Frank Austin
    Massie, Matt
    Danford, Timothy
    Zhang, Zhao
    Laserson, Uri
    Yeksigian, Carl
    Kottalam, Jey
    Ahuja, Arun
    Hammerbacher, Jeff
    Linderman, Michael
    Franklin, Michael J.
    Joseph, Anthony D.
    Patterson, David A.
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 631 - 646
  • [23] Unifying Data and Replica Placement for Data-intensive Services in Geographically Distributed Clouds
    Atrey, Ankita
    Van Seghbroeck, Gregory
    Mora, Higinio
    De Turck, Filip
    Volckaert, Bruno
    CLOSER: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2019, : 25 - 36
  • [24] BigSift: Automated Debugging of Big Data Analytics in Data-Intensive Scalable Computing
    Gulzar, Muhammad Ali
    Wang, Siman
    Kim, Miryung
    ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 863 - 866
  • [25] The Research on Data-Intensive Resource Scheduling in Intelligence Processing
    Cui Yun-fei
    Li Yi
    Liu Dong
    Li Kang
    Lv Peng
    WORLD CONGRESS ON ENGINEERING - WCE 2013, VOL II, 2013, : 869 - 872
  • [26] OTPM: Failure Handling in Data-intensive Analytical Processing
    Han, Binh
    Omiecinski, Edward
    Mark, Leo
    Liu, Ling
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING (COLLABORATECOM), 2011, : 35 - 44
  • [27] Interacting Data-Intensive Services Mining and Placement in Mobile Edge Clouds
    Huang, Yuze
    Huang, Jiwei
    Cheng, Bo
    Yao, Tianxiang
    Chen, Junliang
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING (MOBICOM '17), 2017, : 558 - 560
  • [28] Guest Editors’ Introduction: Special Issue on Data-Intensive Computing in the Clouds
    Tevfik Kosar
    Ioan Raicu
    Journal of Grid Computing, 2012, 10 : 1 - 4
  • [29] Guest Editors' Introduction: Special Issue on Data-Intensive Computing in the Clouds
    Kosar, Tevfik
    Raicu, Ioan
    JOURNAL OF GRID COMPUTING, 2012, 10 (01) : 1 - 4
  • [30] Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds
    Xilong Qu
    Peng Xiao
    Lirong Huang
    The Journal of Supercomputing, 2018, 74 : 2935 - 2955