Unicorn: Unified resource orchestration for multi-domain, geo-distributed data analytics

被引:7
|
作者
Xiang, Qiao [1 ,2 ]
Wang, X. Tony [1 ]
Zhang, J. Jensen [1 ]
Newman, Harvey [4 ]
Yang, Y. Richard [1 ,3 ]
Liu, Y. Jace [1 ]
机构
[1] Tongji Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Yale Univ, Dept Comp Sci, 51 Prospect St, New Haven, CT 06520 USA
[3] Yale Univ, Comp Sci & Elect Engn, New Haven, CT 06520 USA
[4] CALTECH, Phys, Pasadena, CA 91125 USA
基金
美国国家科学基金会; 中国博士后科学基金;
关键词
24;
D O I
10.1016/j.future.2018.09.048
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
As the data volume increases exponentially over time, data-intensive analytics benefits substantially from multi-organizational, geographically-distributed, collaborative computing, where different organizations contribute various yet scarce resources, e.g., computation, storage and networking resources, to collaboratively collect, share and analyze extremely large amounts of data. By analyzing the data analytics trace from the Compact Muon Solenoid (CMS) experiment, one of the largest scientific experiments in the world, and systematically examining the design of existing resource management systems for clusters, we show that the multi-domain, geo-distributed, resource-disaggregated nature of this new paradigm calls for a framework to manage a large set of distributively-owned, heterogeneous resources, with the objective of efficient resource utilization, following the autonomy and privacy of different domains, and that the fundamental challenge for designing such a framework is: how to accurately discover and represent resource availability of a large set of distributively-owned, heterogeneous resources across different domains with minimal information exposure from each domain? Existing resource management systems are designed for single-domain clusters and cannot address this challenge. In this paper, we design Unicorn, the first unified resource orchestration framework for multi-domain, geo-distributed data analytics. In Unicorn, we encode the resource availability for each domain into resource state abstraction, a variant of the network view abstraction extended to accurately represent the availability of multiple resources with minimal information exposure using a set of linear inequalities. We then design a novel, efficient cross-domain query algorithm and a privacy-preserving resource information integration protocol to discover and integrate the accurate, minimal resource availability information for a set of data analytics jobs across different domains. In addition, Unicorn also contains a global resource orchestrator that computes optimal resource allocation decisions for data analytics jobs. We implement a prototype of Unicorn and present preliminary evaluation results to demonstrate its efficiency and efficacy. We also give a full demonstration of the Unicorn system at SuperComputing 2017. (C) 2018 Elsevier B.V. All rights reserved.
引用
收藏
页码:188 / 197
页数:10
相关论文
共 50 条
  • [41] ran-GJS']JS: Orchestrating Data Analytics for Heterogeneous Geo-distributed Edges
    Jin, Yibo
    Qian, Zhuzhong
    Guo, Song
    Zhang, Sheng
    Wang, Xiaoliang
    Lu, Sanglu
    PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [42] SNR: Network-aware Geo-Distributed Stream Analytics
    Mostafaei, Habib
    Afridi, Shafi
    Abawajy, Jemal H.
    21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 820 - 827
  • [43] Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning
    Xu, Chenhan
    Wang, Kun
    Li, Peng
    Xia, Rui
    Guo, Song
    Guo, Minyi
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (01): : 205 - 215
  • [44] PPAS: Privacy-preserving Resource Discovery for Multi-domain SFC orchestration
    Joshi, Neha
    Kumar, Rishabh
    Thakur, Abhishek
    Franklin, A. Antony
    Kumar, N. V. Narendra
    2022 31ST INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2022), 2022,
  • [45] A Survey on Distributed NFV Multi-Domain Orchestration from an Algorithmic Functional Perspective
    Cisneros, Josue Castaneda
    Yangui, Sami
    Hernandez, Saul E. Pomares
    Drira, Khalil
    IEEE COMMUNICATIONS MAGAZINE, 2022, 60 (08) : 60 - 65
  • [46] Efficient Geo-Distributed Data Processing with Rout
    Jayalath, Chamikara
    Eugster, Patrick
    2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 470 - 480
  • [47] An Immune-based Optimization Algorithm of Multi-tenant Resource Allocation for Geo-distributed Data Centers
    Song, Yazhen
    Peng, Jun
    Liu, Weirong
    Zhang, Xiaoyong
    Gu, Xin
    Yu, Wentao
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 88 - 93
  • [48] runData: Re-Distributing Data via Piggybacking for Geo-Distributed Data Analytics Over Edges
    Jin, Yibo
    Qian, Zhuzhong
    Guo, Song
    Zhang, Sheng
    Jiao, Lei
    Lu, Sanglu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (01) : 40 - 55
  • [49] Compliant Geo-distributed Data Processing in Action
    Beedkar, Kaustubh
    Brekardin, David
    Quiane-Ruiz, Jorge-Anulfo
    Markl, Volker
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (12): : 2843 - 2846
  • [50] Octopus: Based on Congestion-aware Scheduling on Geo-distributed Big Data Analytics Cluster
    Du, Haizhou
    Zhang, Keke
    Yang, Zhenchen
    2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2018, : 490 - 495