Unicorn: Unified resource orchestration for multi-domain, geo-distributed data analytics

被引:7
|
作者
Xiang, Qiao [1 ,2 ]
Wang, X. Tony [1 ]
Zhang, J. Jensen [1 ]
Newman, Harvey [4 ]
Yang, Y. Richard [1 ,3 ]
Liu, Y. Jace [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Yale Univ, Dept Comp Sci, 51 Prospect St, New Haven, CT 06520 USA
[3] Yale Univ, Comp Sci & Elect Engn, New Haven, CT 06520 USA
[4] CALTECH, Phys, Pasadena, CA 91125 USA
基金
美国国家科学基金会; 中国博士后科学基金;
关键词
24;
D O I
10.1016/j.future.2018.09.048
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the data volume increases exponentially over time, data-intensive analytics benefits substantially from multi-organizational, geographically-distributed, collaborative computing, where different organizations contribute various yet scarce resources, e.g., computation, storage and networking resources, to collaboratively collect, share and analyze extremely large amounts of data. By analyzing the data analytics trace from the Compact Muon Solenoid (CMS) experiment, one of the largest scientific experiments in the world, and systematically examining the design of existing resource management systems for clusters, we show that the multi-domain, geo-distributed, resource-disaggregated nature of this new paradigm calls for a framework to manage a large set of distributively-owned, heterogeneous resources, with the objective of efficient resource utilization, following the autonomy and privacy of different domains, and that the fundamental challenge for designing such a framework is: how to accurately discover and represent resource availability of a large set of distributively-owned, heterogeneous resources across different domains with minimal information exposure from each domain? Existing resource management systems are designed for single-domain clusters and cannot address this challenge. In this paper, we design Unicorn, the first unified resource orchestration framework for multi-domain, geo-distributed data analytics. In Unicorn, we encode the resource availability for each domain into resource state abstraction, a variant of the network view abstraction extended to accurately represent the availability of multiple resources with minimal information exposure using a set of linear inequalities. We then design a novel, efficient cross-domain query algorithm and a privacy-preserving resource information integration protocol to discover and integrate the accurate, minimal resource availability information for a set of data analytics jobs across different domains. In addition, Unicorn also contains a global resource orchestrator that computes optimal resource allocation decisions for data analytics jobs. We implement a prototype of Unicorn and present preliminary evaluation results to demonstrate its efficiency and efficacy. We also give a full demonstration of the Unicorn system at SuperComputing 2017. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:188 / 197
页数:10
相关论文
共 50 条
  • [1] Unicorn: Unified Resource Orchestration for Multi-Domain, Geo-Distributed Data Analytics
    Xiang, Qiao
    Chen, Shenshen
    Gao, Kai
    Newman, Harvey
    Taylor, Ian
    Zhang, Jingxuan
    Yang, Yang Richard
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [2] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    ACM SIGCOMM COMPUTER COMMUNICATION REVIEW, 2015, 45 (04) : 421 - 434
  • [3] Low Latency Geo-distributed Data Analytics
    Pu, Qifan
    Ananthanarayanan, Ganesh
    Bodik, Peter
    Kandula, Srikanth
    Akella, Aditya
    Bahl, Paramvir
    Stoica, Ion
    SIGCOMM'15: PROCEEDINGS OF THE 2015 ACM CONFERENCE ON SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2015, : 421 - 434
  • [4] Multi-Objective Optimizations in Geo-Distributed Data Analytics Systems
    Niu, Zhaojie
    He, Bingsheng
    Zhou, Amelie Chi
    Tong, Lau Chiew
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 519 - 528
  • [5] Demeter: Fine-grained Function Orchestration for Geo-distributed Serverless Analytics
    Yue, Xiaofei
    Yang, Song
    Zhu, Liehuang
    Trajanovski, Stojan
    Fu, Xiaoming
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 2498 - 2507
  • [6] WANalytics: Geo-Distributed Analytics for a Data Intensive World
    Vulimiri, Ashish
    Curino, Carlo
    Godfrey, P. Brighten
    Jungblut, Thomas
    Karanasos, Konstantinos
    Padhye, Jitu
    Varghese, George
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 1087 - 1092
  • [7] Bohr: Similarity Aware Geo-Distributed Data Analytics
    Li, Hangyu
    Xu, Hong
    Nutanong, Sarana
    CONEXT'18: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND TECHNOLOGIES, 2018, : 267 - 279
  • [8] Optimal Query Plans for Geo-distributed Data Analytics at Scale
    Pradhan, Ahana
    Karthik, Srinivas
    Subramanya, Raghunandan
    PROCEEDINGS OF 7TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA, CODS-COMAD 2024, 2024, : 247 - 251
  • [9] Plexus: Optimizing Join Approximation for Geo-Distributed Data Analytics
    Wolfrath, Joel
    Chandra, Abhishek
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON CLOUD COMPUTING, SOCC 2023, 2023, : 1 - 16
  • [10] Fast, scalable and geo-distributed PCA for big data analytics
    Adnan, T. M. Tariq
    Tanjim, Md Mehrab
    Adnan, Muhammad Abdullah
    INFORMATION SYSTEMS, 2021, 98 (98)