Practising Scalable Graph Similarity Joins in MapReduce

被引:1
|
作者
Chen, Yifan [1 ]
Zhao, Xiang [1 ]
Ge, Bin [1 ]
Xiao, Chuan [2 ]
Chi, Chi-Hung [3 ]
机构
[1] Natl Univ Def Technol, Sci & Technol Informat Syst & Engn Lab, Changsha 410073, Hunan, Peoples R China
[2] Nagoya Univ, Nagoya, Aichi 4648601, Japan
[3] CSIRO, Clayton, Vic, Australia
关键词
Graph similarity join; MapReduce; Bloom filter; Multiway join;
D O I
10.1109/BigData.Congress.2014.25
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Along with the emergence of massive graphmodeled data, it is of great importance to investigate graph similarity join due to its wide applications for multiple purposes, including data cleaning, near duplicate detection, etc. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given threshold. Leveraging the MapReduce programming model, we propose MGSJoin, a scalable algorithm following the filtering-verification framework for efficient graph similarity joins. It relies on counting overlapping graph signatures for filtering out non-promising candidates. With the potential issue of too many key-value pairs in the filtering phase, spectral Bloom filters are introduced to reduce the number of key-value pairs. Furthermore, we integrate the multiway join strategy to boost the verification. The superior efficiency and scalability of the proposed algorithms are demonstrated by extensive experimental results.
引用
收藏
页码:112 / 119
页数:8
相关论文
共 50 条
  • [1] Efficient and Scalable Graph Similarity Joins in MapReduce
    Chen, Yifan
    Zhao, Xiang
    Xiao, Chuan
    Zhang, Weiming
    Tang, Jiuyang
    [J]. SCIENTIFIC WORLD JOURNAL, 2014,
  • [2] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497
  • [3] Fast and scalable vector similarity joins with MapReduce
    Byoungju Yang
    Hyun Joon Kim
    Junho Shim
    Dongjoo Lee
    Sang-goo Lee
    [J]. Journal of Intelligent Information Systems, 2016, 46 : 473 - 497
  • [4] MassJoin: A MapReduce-based Method for Scalable String Similarity Joins
    Deng, Dong
    Li, Guoliang
    Hao, Shuang
    Wang, Jiannan
    Feng, Jianhua
    [J]. 2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 340 - 351
  • [5] Metric Similarity Joins Using MapReduce
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 656 - 669
  • [6] Set Similarity Joins on MapReduce: An Experimental Survey
    Fier, Fabian
    Augsten, Nikolaus
    Bouros, Panagiotis
    Leser, Ulf
    Freytag, Johann-Christoph
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (10): : 1110 - 1122
  • [7] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    [J]. INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [8] Efficient Graph Similarity Join with Scalable Prefix-Filtering Using MapReduce
    Pang, Jun
    Gu, Yu
    Xu, Jia
    Bao, Yubin
    Yu, Ge
    [J]. WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 415 - 418
  • [9] Scalable Similarity Joins of Tokenized Strings
    Metwally, Ahmed
    Huang, Chun-Heng
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1766 - 1777
  • [10] Metric Similarity Joins Using MapReduce (Extended abstract)
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    [J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1787 - 1788