Efficient Graph Query Processing over Geo-Distributed Datacenters

被引:7
|
作者
Yuan, Ye [1 ]
Ma, Delong [2 ]
Wen, Zhenyu [3 ]
Ma, Yuliang [2 ]
Wang, Guoren [1 ]
Chen, Lei [4 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Northeastern Univ, Shenyang, Peoples R China
[3] Newcastle Univ, Newcastle Upon Tyne, Tyne & Wear, England
[4] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
Graph search; Geo-distributed; Datacenters; MAPREDUCE;
D O I
10.1145/3397271.3401157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Graph queries have emerged as one of the fundamental techniques to support modern search services, such as PageRank web search, social networking search and knowledge graph search. As such graphs are maintained globally and very huge (e.g., billions of nodes), we need to efficiently process graph queries across multiple geographically distributed datacenters, running geo-distributed graph queries. Existing graph computing frameworks may not work well for geographically distributed datacenters, because they implement a Bulk Synchronous Parallel model that requires excessive inter-datacenter transfers, thereby introducing extremely large latency for query processing. In this paper, we propose GeoGraph-a universal framework to support efficient geo-distributed graph query processing based on clustering datacenters and meta-graph, while reducing the inter-datacenter communication. Our new framework can be applied to many types of graph algorithms without any modification. The framework is developed on the top of Apache Giraph. The experiments were conducted by applying four important graph queries, i.e., shortest path, graph keyword search, subgraph isomorphism and PageRank. The evaluation results show that our proposed framework can achieve up to 82% faster convergence, 42% lower WAN bandwidth usage, and 45% less total monetary cost for the four graph queries, with input graphs stored across ten geo-distributed datacenters.
引用
收藏
页码:619 / 628
页数:10
相关论文
共 50 条
  • [1] On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters
    Zhou, Amelie Chi
    Ibrahim, Shadi
    He, Bingsheng
    [J]. 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 1397 - 1407
  • [2] Cost-Aware Partitioning for Efficient Large Graph Processing in Geo-Distributed Datacenters
    Zhou, Amelie Chi
    Shen, Bingkun
    Xiao, Yao
    Ibrahim, Shadi
    He, Bingsheng
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (07) : 1707 - 1723
  • [3] Compliant Geo-distributed Query Processing
    Beedkar, Kaustubh
    Quiane-Ruiz, Jorge-Arnulfo
    Markl, Volker
    [J]. SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 181 - 193
  • [4] Towards Efficient Graph Processing in Geo-Distributed Data Centers
    Yao, Feng
    Tao, Qian
    Lin, Shengyuan
    Zhang, Yanfeng
    Yu, Wenyuan
    Gong, Shufeng
    Wang, Qiange
    Yu, Ge
    Zhou, Jingren
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (11) : 2147 - 2160
  • [5] On efficient virtual cluster scaling across geo-distributed datacenters
    Xu, Xinping
    Li, Wenxin
    Qi, Heng
    Li, Keqiu
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (10):
  • [6] Efficient Geo-Distributed Data Processing with Rout
    Jayalath, Chamikara
    Eugster, Patrick
    [J]. 2013 IEEE 33RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2013, : 470 - 480
  • [7] Scheduling Jobs Across Geo-distributed Datacenters
    Hung, Chien-Chun
    Golubchik, Leana
    Yu, Minlan
    [J]. ACM SOCC'15: PROCEEDINGS OF THE SIXTH ACM SYMPOSIUM ON CLOUD COMPUTING, 2015, : 111 - 124
  • [8] Cost-Aware Big Data Processing Across Geo-Distributed Datacenters
    Xiao, Wenhua
    Bao, Weidong
    Zhu, Xiaomin
    Liu, Ling
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (11) : 3114 - 3127
  • [9] Calantha: Content Distribution across Geo-Distributed Datacenters
    Li, Yangyang
    Zhang, Linchao
    Jia, Yue
    Liao, Yong
    Xie, Haiyong
    [J]. 2017 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2017, : 724 - 729
  • [10] Efficient Data and Task Co-Scheduling for Scientific Workflow in Geo-distributed Datacenters
    Chen, Jian
    Zhang, Jinghui
    Song, Aibo
    [J]. 2017 FIFTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2017, : 63 - 68