Bermuda: An Efficient MapReduce Triangle Listing Algorithm for Web-Scale Graphs

被引:0
|
作者
Xiao, Dongqing [1 ]
Eltabakh, Mohamed [1 ]
Kong, Xiangnan [1 ]
机构
[1] Worcester Polytech Inst, 100 Inst Rd, Worcester, MA 01609 USA
基金
美国国家科学基金会;
关键词
Distributed Triangle Listing; MapReduce; Graph Analytics;
D O I
10.1145/2949689.2949715
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Triangle listing plays an important role in graph analysis and has numerous graph mining applications. With the rapid growth of graph data, distributed methods for listing triangles over massive graphs are urgently needed. Therefore, the triangle listing problem has been studied in several distributed infrastructures including MapReduce. However, existing algorithms suffer from generating and shuffling huge amounts of intermediate data, where interestingly, a large percentage of this data is redundant. Inspired by this observation, we present the "Bermuda" method, an efficient MapReduce-based triangle listing technique for massive graphs. Different from existing approaches, Bermuda effectively reduces the size of the intermediate data via redundancy elimination and sharing of messages whenever possible. As a result, Bermuda achieves orders-of-magnitudes of speedup and enables processing larger graphs that other techniques fail to process under the same resources. Bermuda exploits the locality of processing, i.e., in which reduce instance each graph vertex will be processed, to avoid the redundancy of generating messages from mappers to reducers. Bermuda also proposes novel message sharing techniques within each reduce instance to increase the usability of the received messages. We present and analyze several reduce-side caching strategies that dynamically learn the expected access patterns of the shared messages, and adaptively deploy the appropriate technique for better sharing. Extensive experiments conducted on real-world large-scale graphs show that Bermuda speeds up the triangle listing computations by factors up to 10x. Moreover, with a relatively small cluster, Bermuda can scale up to large datasets, e.g., ClueWeb graph dataset (688GB), while other techniques fail to finish.
引用
收藏
页数:12
相关论文
共 42 条
  • [21] ARROW: Approximating Reachability using Random walks Over Web-scale Graphs
    Sengupta, Neha
    Bagchi, Amitabha
    Ramanath, Maya
    Bedathur, Srikanta
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 470 - 481
  • [22] MultiBiSage: A Web-Scale Recommendation System Using Multiple Bipartite Graphs at Pinterest
    Gurukar, Saket
    Pancha, Nikil
    Zhai, Andrew
    Kim, Eric
    Hu, Samson
    Parthasarathy, Srinivasan
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 16 (04): : 781 - 789
  • [23] Querying Web-Scale Knowledge Graphs Through Effective Pruning of Search Space
    Jin, Jiahui
    Luo, Junzhou
    Khemmarat, Samamon
    Gao, Lixin
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2342 - 2356
  • [24] Maze: A Cost-Efficient Video Deduplication System at Web-scale
    Qin, An
    Xiao, Mengbai
    Huang, Ben
    Zhang, Xiaodong
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3163 - 3172
  • [25] WebPIE: A Web-scale Parallel Inference Engine using MapReduce (vol 10, pg 59, 2012)
    Urbani, Jacopo
    Kotoulas, Spyros
    Maassen, Jason
    Van Harmelen, Frank
    Bal, Henri
    [J]. JOURNAL OF WEB SEMANTICS, 2012, 17 : 44 - 44
  • [26] Efficient Vertex-Oriented Polytopic Projection for Web-Scale Applications
    Ramanath, Rohan
    Keerthi, S. Sathiya
    Pan, Yao
    Salomatin, Konstantin
    Basu, Kinjal
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3821 - 3829
  • [27] Populating Web-Scale Knowledge Graphs Using Distantly Supervised Relation Extraction and Validation
    Dash, Sarthak
    Glass, Michael R.
    Gliozzo, Alfio
    Canim, Mustafa
    Rossiello, Gaetano
    [J]. INFORMATION, 2021, 12 (08)
  • [28] Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs
    Tsitsulin, Anton
    Munkhoeva, Marina
    Perozzi, Bryan
    [J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2697 - 2703
  • [29] Realtime Index-Free Single Source SimRank Processing on Web-Scale Graphs
    Shi, Jieming
    Jin, Tianyuan
    Yang, Renchi
    Xiao, Xiaokui
    Yang, Yin
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 13 (07): : 966 - 978
  • [30] An Effective and Efficient MapReduce Algorithm for Computing BFS-Based Traversals of Large-Scale RDF Graphs
    Cuzzocrea, Alfredo
    Cosulschi, Mirel
    de Virgilio, Roberto
    [J]. ALGORITHMS, 2016, 9 (01)