Unicorn: Unified resource orchestration for multi-domain, geo-distributed data analytics

被引:7
|
作者
Xiang, Qiao [1 ,2 ]
Wang, X. Tony [1 ]
Zhang, J. Jensen [1 ]
Newman, Harvey [4 ]
Yang, Y. Richard [1 ,3 ]
Liu, Y. Jace [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Yale Univ, Dept Comp Sci, 51 Prospect St, New Haven, CT 06520 USA
[3] Yale Univ, Comp Sci & Elect Engn, New Haven, CT 06520 USA
[4] CALTECH, Phys, Pasadena, CA 91125 USA
基金
美国国家科学基金会; 中国博士后科学基金;
关键词
24;
D O I
10.1016/j.future.2018.09.048
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the data volume increases exponentially over time, data-intensive analytics benefits substantially from multi-organizational, geographically-distributed, collaborative computing, where different organizations contribute various yet scarce resources, e.g., computation, storage and networking resources, to collaboratively collect, share and analyze extremely large amounts of data. By analyzing the data analytics trace from the Compact Muon Solenoid (CMS) experiment, one of the largest scientific experiments in the world, and systematically examining the design of existing resource management systems for clusters, we show that the multi-domain, geo-distributed, resource-disaggregated nature of this new paradigm calls for a framework to manage a large set of distributively-owned, heterogeneous resources, with the objective of efficient resource utilization, following the autonomy and privacy of different domains, and that the fundamental challenge for designing such a framework is: how to accurately discover and represent resource availability of a large set of distributively-owned, heterogeneous resources across different domains with minimal information exposure from each domain? Existing resource management systems are designed for single-domain clusters and cannot address this challenge. In this paper, we design Unicorn, the first unified resource orchestration framework for multi-domain, geo-distributed data analytics. In Unicorn, we encode the resource availability for each domain into resource state abstraction, a variant of the network view abstraction extended to accurately represent the availability of multiple resources with minimal information exposure using a set of linear inequalities. We then design a novel, efficient cross-domain query algorithm and a privacy-preserving resource information integration protocol to discover and integrate the accurate, minimal resource availability information for a set of data analytics jobs across different domains. In addition, Unicorn also contains a global resource orchestrator that computes optimal resource allocation decisions for data analytics jobs. We implement a prototype of Unicorn and present preliminary evaluation results to demonstrate its efficiency and efficacy. We also give a full demonstration of the Unicorn system at SuperComputing 2017. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:188 / 197
页数:10
相关论文
共 50 条
  • [21] Multi-Domain Service Orchestration over Networks and Clouds: A Unified Approach
    Sonkoly, Balázs
    Czentye, János
    Szabo, Robert
    Jocha, Dávid
    Elek, János
    Sahhaf, Sahel
    Tavernier, Wouter
    Risso, Fulvio
    Computer Communication Review, 2015, 45 (04): : 377 - 378
  • [22] Dynamic Data Analytics in Multi-domain Environments
    Blasch, Erik
    Ashdown, Jonathan
    Kopsaftopoulos, Fotis
    Varela, Carlos
    Newkirk, Richard
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS, 2019, 11006
  • [23] Geo-Distributed IoT Data Analytics With Deadline Constraints Across Network Edge
    Chen, Yiting
    Luo, Lailong
    Ren, Bangbang
    Guo, Deke
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (22) : 22914 - 22929
  • [24] A TTL-based Approach for Data Aggregation in Geo-distributed Streaming Analytics
    Kumar, Dhruv
    Li, Jian
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2019, 3 (02)
  • [25] Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics
    Heintz, Benjamin
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    PROCEEDINGS OF THE SEVENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC 2016), 2016, : 361 - 373
  • [26] Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics
    Heintz, Benjamin
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (01) : 232 - 245
  • [27] Multi-Provider and Multi-Domain Resource Orchestration in Network Functions Virtualization
    Tuan-Minh Pham
    Hoai-Nam Chu
    IEEE ACCESS, 2019, 7 : 86920 - 86931
  • [28] Optimizing Timeliness and Cost in Geo-Distributed Streaming Analytics
    Heintz, Benjamin
    Chandra, Abhishek
    Sitaraman, Ramesh K.
    IEEE Transactions on Cloud Computing, 2020, 8 (01): : 232 - 245
  • [29] Optimizing Geo-Distributed Data Processing with Resource Heterogeneity over the Internet
    Marzuni, Saeed mirpour
    Toosi, Adel
    Savadi, Abdorreza
    Naghibzadeh, Mahmud
    Taniar, David
    ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2025, 25 (01)
  • [30] VirtualEdge: Multi-Domain Resource Orchestration and Virtualization in Cellular Edge Computing
    Liu, Qiang
    Han, Tao
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 1051 - 1060