TomusBlobs: scalable data-intensive processing on Azure clouds

被引:3
|
作者
Costan, Alexandru [1 ]
Tudoran, Radu [1 ]
Antoniu, Gabriel [1 ]
Brasche, Goetz [2 ]
机构
[1] Inria Rennes Bretagne Atlant, Campus Beaulieu, F-35042 Rennes, France
[2] EMIC, Microsoft Adv Technol Labs Europe, Ritterstr 23, D-52072 Aachen, Germany
来源
关键词
big data; cloud computing; data-intensive processing; cloud storage; MapReduce; scientific applications; Azure;
D O I
10.1002/cpe.3034
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The emergence of cloud computing has brought the opportunity to use large-scale compute infrastructures for a broader and broader spectrum of applications and users. As the cloud paradigm gets attractive for the elasticity' in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. As Big Data analysis on clouds becomes more and more relevant in many application areas, enabling high-throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper, we address this challenge at the level of cloud storage. We introduce a concurrency-optimized data storage system (called TomusBlobs), which federates the virtual disks associated to the Virtual Machines running the application code on the cloud. We demonstrate the performance benefits of our solution for efficient data-intensive processing by building an optimized prototype MapReduce framework for Microsoft's Azure cloud platform on the basis of TomusBlobs. Finally, we specifically address the limitations of state-of-the-art MapReduce frameworks for reduce-intensive workloads, by proposing MapIterativeReduce as an extension of the MapReduce model. We validate the aforementioned contributions through large-scale experiments with synthetic benchmarks and with real-world applications on the Azure commercial cloud by using resources distributed across multiple data centers; they demonstrate that our solutions bring substantial benefits to data-intensive applications compared with approaches relying on state-of-the-art cloud object storage. Copyright (c) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:950 / 976
页数:27
相关论文
共 50 条
  • [41] Using SGML as a basis for data-intensive natural language processing
    McKelvie, D
    Brew, C
    Thompson, HS
    COMPUTERS AND THE HUMANITIES, 1997, 31 (05): : 367 - 388
  • [42] Data-Intensive Science
    Strawn, George
    IT PROFESSIONAL, 2016, 18 (05) : 66 - 68
  • [43] Using SGML as a Basis for Data-Intensive Natural Language Processing
    D. McKelvie
    C. Brew
    H.S. Thompson
    Computers and the Humanities, 1997, 31 : 367 - 388
  • [44] A hybrid evolutionary algorithm for task scheduling and data assignment of data-intensive scientific workflows on clouds
    Teylo, Luan
    de Paula, Ubiratam
    Frota, Yuri
    de Oliveira, Daniel
    Drummond, Lucia M. A.
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 76 : 1 - 17
  • [45] Optimization of data-intensive workflows in stream-based data processing models
    Ahmad, Saima Gulzar
    Liew, Chee Sun
    Rafique, M. Mustafa
    Munir, Ehsan Ullah
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (09): : 3901 - 3923
  • [46] Optimizing Data-Intensive Applications Automatically By Leveraging Parallel Data Processing Frameworks
    Ahmad, Maaz Bin Safeer
    Cheung, Alvin
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 1675 - 1678
  • [47] Optimization of data-intensive workflows in stream-based data processing models
    Saima Gulzar Ahmad
    Chee Sun Liew
    M. Mustafa Rafique
    Ehsan Ullah Munir
    The Journal of Supercomputing, 2017, 73 : 3901 - 3923
  • [48] Resource provisioning for data-intensive applications with deadline constraints on hybrid clouds using Aneka
    Toosi, Adel Nadjaran
    Sinnott, Richard O.
    Buyya, Rajkumar
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 79 : 765 - 775
  • [49] Flint: Batch-Interactive Data-Intensive Processing on Transient Servers
    Sharma, Prateek
    Guo, Tian
    He, Xin
    Irwin, David
    Shenoy, Prashant
    PROCEEDINGS OF THE ELEVENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, (EUROSYS 2016), 2016,
  • [50] INVITED: Enabling Practical Processing in and near Memory for Data-Intensive Computing
    Mutlu, Onur
    Ghose, Saugata
    Gomez-Luna, Juan
    Ausavarungnirun, Rachata
    PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,