MapReduce-based entity matching with multiple blocking functions

被引:0
|
作者
Cheqing Jin
Jie Chen
Huiping Liu
机构
[1] East China Normal University,Institute for Data Science and Engineering, School of Computer Science and Software Engineering
来源
关键词
entity matching; MapReduce; load balancing; pair deduplication;
D O I
暂无
中图分类号
学科分类号
摘要
Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known as the blocking-based method, tends to select a small proportion of record pairs for verification with a far lower cost than O(n2), where n is the size of the data set. Furthermore, executing multiple blocking functions independently is critical since much more matching records can be found in this way, so that the quality of the query result can be improved significantly.
引用
收藏
页码:895 / 911
页数:16
相关论文
共 50 条
  • [21] Distributed forests for MapReduce-based machine learning
    Wakayama, Ryoji
    Murata, Ryuei
    Kimura, Akisato
    Yamashita, Takayoshi
    Yamauchi, Yuji
    Fujiyoshi, Hironobu
    [J]. PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 276 - 280
  • [22] An Experimental Survey of MapReduce-Based Similarity Joins
    Silva, Yasin N.
    Reed, Jason
    Brown, Kyle
    Wadsworth, Adelbert
    Rong, Chuitian
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2016, 2016, 9939 : 181 - 195
  • [23] MapReduce-based H-mine algorithm
    Feng, Xingjie
    Zhao, Jie
    Zhang, Zhiyuan
    [J]. 2015 FIFTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION AND MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC), 2015, : 1755 - 1760
  • [24] MapReduce-based Similarity Measurement for Business Processes
    Gao, Juntao
    Wang, Xueshan
    Wang, Yongan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2016, 16 (03): : 95 - 99
  • [25] A MapReduce-Based ELM for Regression in Big Data
    Wu, B.
    Yan, T. H.
    Xu, X. S.
    He, B.
    Li, W. H.
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 164 - 173
  • [26] MapReduce-based distributed tensor clustering algorithm
    Hongjun Zhang
    Peng Li
    Fanshuo Meng
    Weibei Fan
    Zhuangzhuang Xue
    [J]. Neural Computing and Applications, 2023, 35 : 24633 - 24649
  • [27] MapReduce-Based Graph Structural Clustering Algorithm
    Zhang W.-P.
    Li Z.-J.
    Li R.-H.
    Liu Y.-H.
    Mao R.
    Qiao S.-J.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2018, 29 (03): : 627 - 641
  • [28] A MapReduce-Based Algorithm for Parallelizing Collusion Detection in Hadoop
    Mortazavi, Mahmood
    Ladani, Behrouz Tork
    [J]. 2015 7TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2015,
  • [29] A mapreduce-based adjoint method for preventing brain disease
    Zettam M.
    Laassiri J.
    Enneya N.
    [J]. Zettam, Manal (zettammanal@gmail.com), 2018, SpringerOpen (05)
  • [30] ScaDiGraph: A MapReduce-Based Method for Solving Graph Problems
    Barkhordari, Mohammadhossein
    Niamanesh, Mardi
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2017, 33 (01) : 143 - 158