TomusBlobs: scalable data-intensive processing on Azure clouds

被引:3
|
作者
Costan, Alexandru [1 ]
Tudoran, Radu [1 ]
Antoniu, Gabriel [1 ]
Brasche, Goetz [2 ]
机构
[1] Inria Rennes Bretagne Atlant, Campus Beaulieu, F-35042 Rennes, France
[2] EMIC, Microsoft Adv Technol Labs Europe, Ritterstr 23, D-52072 Aachen, Germany
来源
关键词
big data; cloud computing; data-intensive processing; cloud storage; MapReduce; scientific applications; Azure;
D O I
10.1002/cpe.3034
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The emergence of cloud computing has brought the opportunity to use large-scale compute infrastructures for a broader and broader spectrum of applications and users. As the cloud paradigm gets attractive for the elasticity' in resource usage and associated costs (the users only pay for resources actually used), cloud applications still suffer from the high latencies and low performance of cloud storage services. As Big Data analysis on clouds becomes more and more relevant in many application areas, enabling high-throughput massive data processing on cloud data becomes a critical issue, as it impacts the overall application performance. In this paper, we address this challenge at the level of cloud storage. We introduce a concurrency-optimized data storage system (called TomusBlobs), which federates the virtual disks associated to the Virtual Machines running the application code on the cloud. We demonstrate the performance benefits of our solution for efficient data-intensive processing by building an optimized prototype MapReduce framework for Microsoft's Azure cloud platform on the basis of TomusBlobs. Finally, we specifically address the limitations of state-of-the-art MapReduce frameworks for reduce-intensive workloads, by proposing MapIterativeReduce as an extension of the MapReduce model. We validate the aforementioned contributions through large-scale experiments with synthetic benchmarks and with real-world applications on the Azure commercial cloud by using resources distributed across multiple data centers; they demonstrate that our solutions bring substantial benefits to data-intensive applications compared with approaches relying on state-of-the-art cloud object storage. Copyright (c) 2013 John Wiley & Sons, Ltd.
引用
收藏
页码:950 / 976
页数:27
相关论文
共 50 条
  • [1] Data-intensive workflow management: For clouds and data-intensive and scalable computing environments
    De Oliveira, Daniel C.M.
    Liu, Ji
    Pacitti, Esther
    Synthesis Lectures on Data Management, 2019, 14 (04): : 1 - 179
  • [2] A scalable architecture for data-intensive natural language processing
    Beloki, Zuhaitz
    Artola, Xabier
    Soroa, Aitor
    NATURAL LANGUAGE ENGINEERING, 2017, 23 (05) : 709 - 731
  • [3] Scalable Data Placement of Data-intensive Services in Geo-distributed Clouds
    Atrey, Ankita
    Van Seghbroeck, Gregory
    Volckaert, Bruno
    De Turck, Filip
    CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 497 - 508
  • [4] Scalable Data-Intensive Analytics
    Hsu, Meichun
    Chen, Qiming
    BUSINESS INTELLIGENCE FOR THE REAL-TIME ENTERPRISE, 2009, 27 : 97 - +
  • [5] SpeCH: A scalable framework for data placement of data-intensive services in geo-distributed clouds
    Atrey, Ankita
    Van Seghbroeck, Gregory
    Mora, Higinio
    De Turck, Filip
    Volckaert, Bruno
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 142 : 1 - 14
  • [6] Data-intensive Spatial Indexing on the Clouds
    Rezgui, Abdelmounaam
    Malik, Zaki
    Xia, Jizhe
    Liu, Kai
    Yang, Chaowei
    2013 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2013, 18 : 2615 - 2618
  • [7] Coordinating Green Clouds as Data-Intensive Computing
    Biran, Yahav
    Collins, George
    Liberatore, Joseph
    PROCEEDINGS 2016 EIGHTH ANNUAL IEEE GREEN TECHNOLOGIES CONFERENCE (GREENTECH 2016), 2016, : 130 - 135
  • [8] Data-Intensive Scalable Computing for Scientific Applications
    Bryant, Randal E.
    COMPUTING IN SCIENCE & ENGINEERING, 2011, 13 (06) : 25 - 33
  • [9] Automated Debugging in Data-Intensive Scalable Computing
    Gulzar, Muhammad Ali
    Interlandi, Matteo
    Han, Xueyuan
    Li, Mingda
    Condie, Tyson
    Kim, Miryung
    PROCEEDINGS OF THE 2017 SYMPOSIUM ON CLOUD COMPUTING (SOCC '17), 2017, : 520 - 534
  • [10] Data-Intensive Text Processing with MapReduce
    Xu, Peng
    COMPUTATIONAL LINGUISTICS, 2011, 37 (03) : 635 - 637