An Efficient MapReduce Algorithm for Counting Triangles in a Very Large Graph

被引:30
|
作者
Park, Ha-Myung [1 ]
Chung, Chin-Wan [2 ]
机构
[1] Korea Adv Inst Sci & Technol, Div Web Sci & Technol, 291 Daehak Ro, Daejeon, South Korea
[2] Korea Adv Inst Sci & Technol, Div Web Sci & Technol, Dept Comp Sci, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
Graph; triangle; MapReduce;
D O I
10.1145/2505515.2505563
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Triangle counting problem is one of the fundamental problem in various domains. The problem can be utilized for computation of clustering coefficient, transitivity, trianglular connectivity, trusses, etc. The problem have been extensively studied in internal memory but the algorithms are not scalable for enormous graphs. In recent years, the MapReduce has emerged as a de facto standard framework for processing large data through parallel computing. A MapReduce algorithm was proposed for the problem based on graph partitioning. However, the algorithm redundantly generates a large number of intermediate data that cause network overload and prolong the processing time. In this paper, we propose a new algorithm based on graph partitioning with a novel idea of triangle classification to count the number of triangles in a graph. The algorithm substantially reduces the duplication by classifying triangles into three types and processing each triangle differently according to its type. In the experiments, we compare the proposed algorithm with recent existing algorithms using both synthetic datasets and real-world datasets that are composed of millions of nodes and billions of edges. The proposed algorithm outperforms other algorithms in most cases. Especially, for a twitter dataset, the proposed algorithm is more than twice as fast as existing MapReduce algorithms. Moreover, the performance gap increases as the graph becomes larger and denser.
引用
收藏
页码:539 / 548
页数:10
相关论文
共 50 条
  • [21] MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
    Suzanne J Matthews
    Tiffani L Williams
    [J]. BMC Bioinformatics, 11
  • [22] MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees
    Matthews, Suzanne J.
    Williams, Tiffani L.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [23] Efficient access methods for very large distributed graph databases
    Luaces, David
    Viqueira, Jose R. R.
    Cotos, Jose M.
    Flores, Julian C.
    [J]. INFORMATION SCIENCES, 2021, 573 (573) : 65 - 81
  • [24] A Big Graph Clustering Algorithm Based on MapReduce
    Leng, Yonglin
    Zhang, Qingchen
    [J]. MODERN TECHNOLOGIES IN MATERIALS, MECHANICS AND INTELLIGENT SYSTEMS, 2014, 1049 : 1467 - +
  • [25] EPiC: efficient privacy-preserving counting for MapReduce
    Triet Dang Vo-Huu
    Blass, Erik-Oliver
    Noubir, Guevara
    [J]. COMPUTING, 2019, 101 (09) : 1265 - 1286
  • [26] PATRIC: A Parallel Algorithm for Counting Triangles in Massive Networks
    Arifuzzaman, Shaikh
    Khan, Maleq
    Marathe, Madhav
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 529 - 538
  • [27] Efficient and Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Xiao, Chuan
    Zhang, Weiming
    Tang, Jiuyang
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [28] EPiC: efficient privacy-preserving counting for MapReduce
    Triet Dang Vo-Huu
    Erik-Oliver Blass
    Guevara Noubir
    [J]. Computing, 2019, 101 : 1265 - 1286
  • [29] Finding, Counting, and Highlighting all Triangles in Large Graphs
    Uddin, Md. Ashraf
    Chowdhury, Kanchan
    Ray, Liton Kumar
    [J]. 2019 1ST INTERNATIONAL CONFERENCE ON ROBOTICS, ELECTRICAL AND SIGNAL PROCESSING TECHNIQUES (ICREST), 2019, : 59 - 62
  • [30] Fast Counting of Triangles in Large Real Networks without counting: Algorithms and Laws
    Tsourakakis, Charalampos E.
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 608 - 617