Bermuda: An Efficient MapReduce Triangle Listing Algorithm for Web-Scale Graphs

被引:0
|
作者
Xiao, Dongqing [1 ]
Eltabakh, Mohamed [1 ]
Kong, Xiangnan [1 ]
机构
[1] Worcester Polytech Inst, 100 Inst Rd, Worcester, MA 01609 USA
基金
美国国家科学基金会;
关键词
Distributed Triangle Listing; MapReduce; Graph Analytics;
D O I
10.1145/2949689.2949715
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Triangle listing plays an important role in graph analysis and has numerous graph mining applications. With the rapid growth of graph data, distributed methods for listing triangles over massive graphs are urgently needed. Therefore, the triangle listing problem has been studied in several distributed infrastructures including MapReduce. However, existing algorithms suffer from generating and shuffling huge amounts of intermediate data, where interestingly, a large percentage of this data is redundant. Inspired by this observation, we present the "Bermuda" method, an efficient MapReduce-based triangle listing technique for massive graphs. Different from existing approaches, Bermuda effectively reduces the size of the intermediate data via redundancy elimination and sharing of messages whenever possible. As a result, Bermuda achieves orders-of-magnitudes of speedup and enables processing larger graphs that other techniques fail to process under the same resources. Bermuda exploits the locality of processing, i.e., in which reduce instance each graph vertex will be processed, to avoid the redundancy of generating messages from mappers to reducers. Bermuda also proposes novel message sharing techniques within each reduce instance to increase the usability of the received messages. We present and analyze several reduce-side caching strategies that dynamically learn the expected access patterns of the shared messages, and adaptively deploy the appropriate technique for better sharing. Extensive experiments conducted on real-world large-scale graphs show that Bermuda speeds up the triangle listing computations by factors up to 10x. Moreover, with a relatively small cluster, Bermuda can scale up to large datasets, e.g., ClueWeb graph dataset (688GB), while other techniques fail to finish.
引用
收藏
页数:12
相关论文
共 42 条
  • [1] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
    Zhu, Yuanyuan
    Zhang, Hao
    Qin, Lu
    Cheng, Hong
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (02) : 149 - 176
  • [2] Efficient MapReduce algorithms for triangle listing in billion-scale graphs
    Yuanyuan Zhu
    Hao Zhang
    Lu Qin
    Hong Cheng
    [J]. Distributed and Parallel Databases, 2017, 35 : 149 - 176
  • [3] Efficient Triangle Listing for Billion-Scale Graphs
    Zhang, Hao
    Zhu, Yuanyuan
    Qin, Lu
    Cheng, Hong
    Yu, Jeffrey Xu
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 813 - 822
  • [4] An efficient exact algorithm for triangle listing in large graphs
    Lagraa, Sofiane
    Seba, Hamida
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (05) : 1350 - 1369
  • [5] An efficient exact algorithm for triangle listing in large graphs
    Sofiane Lagraa
    Hamida Seba
    [J]. Data Mining and Knowledge Discovery, 2016, 30 : 1350 - 1369
  • [6] Web-scale Entity Annotation Using MapReduce
    Gupta, Shashank
    Chandramouli, Varun
    Chakrabarti, Soumen
    [J]. 2013 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2013, : 99 - 108
  • [7] Constructing and Mining Web-Scale Knowledge Graphs
    Gabrilovich, Evgeniy
    Usunier, Nicolas
    [J]. SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 1195 - 1197
  • [8] Constructing and Mining Web-Scale Knowledge Graphs
    Bordes, Antoine
    Gabrilovich, Evgeniy
    [J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1967 - 1967
  • [9] Enabling Web-Scale Knowledge Graphs Querying
    Azzam, Amr
    [J]. SEMANTIC WEB: ESWC 2020 SATELLITE EVENTS, 2020, 12124 : 229 - 239
  • [10] MRQUSAR: A web-scale distributed spatial reasoner using MapReduce
    Nam, Sangha
    Kim, Incheol
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2017, : 296 - 303