MapReduce-based entity matching with multiple blocking functions

被引:0
|
作者
Cheqing Jin
Jie Chen
Huiping Liu
机构
[1] East China Normal University,Institute for Data Science and Engineering, School of Computer Science and Software Engineering
来源
关键词
entity matching; MapReduce; load balancing; pair deduplication;
D O I
暂无
中图分类号
学科分类号
摘要
Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known as the blocking-based method, tends to select a small proportion of record pairs for verification with a far lower cost than O(n2), where n is the size of the data set. Furthermore, executing multiple blocking functions independently is critical since much more matching records can be found in this way, so that the quality of the query result can be improved significantly.
引用
收藏
页码:895 / 911
页数:16
相关论文
共 50 条
  • [41] A MapReduce-based K-means clustering algorithm
    YiMin Mao
    DeJin Gan
    D. S. Mwakapesa
    Y. A. Nanehkaran
    Tao Tao
    XueYu Huang
    [J]. The Journal of Supercomputing, 2022, 78 : 5181 - 5202
  • [42] Scalable Load Balancing for MapReduce-based Record Linkage
    Yan, Wei
    Xue, Yuan
    Malin, Bradley
    [J]. 2013 IEEE 32ND INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2013,
  • [43] MapReduce-Based Pattern Classification for Design Space Analysis
    Wu, Yan-Shiun
    Su, Hong-Yan
    Chang, Yi-Hsiang
    Topaloglu, Rasit Onur
    Li, Yih-Lang
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON VLSI DESIGN, AUTOMATION AND TEST (VLSI-DAT), 2018,
  • [44] An Efficient Blocking Technique for Reference Matching using MapReduce
    Marcus Paradies
    [J]. Datenbank-Spektrum, 2011, 11 (1) : 47 - 49
  • [45] MapReduce-based Parallel Linear Regression for Face Recognition
    Zhang, LiSheng
    Liu, HuaYong
    Lei, DaJiang
    [J]. MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 2628 - 2632
  • [46] A MapReduce-based Approach for Computing Reachability Preserving Graph
    Ding, Guohui
    Ma, Xujun
    Fan, Chunlong
    [J]. 2015 8TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI), 2015, : 619 - 623
  • [47] A MapReduce-based K-means clustering algorithm
    Mao, YiMin
    Gan, DeJin
    Mwakapesa, D. S.
    Nanehkaran, Y. A.
    Tao, Tao
    Huang, XueYu
    [J]. JOURNAL OF SUPERCOMPUTING, 2022, 78 (04): : 5181 - 5202
  • [48] Research on MapReduce-based Cloud Storage Batch Auditing
    Jin, Yu
    Yan, Dong
    He, Heng
    [J]. PROCEEDINGS OF THE 2016 IEEE 11TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2016, : 1317 - 1322
  • [49] MapReduce-based Image Processing System with Automated Parallelization
    Sozykin, A. V.
    Goldshtein, M. L.
    [J]. BULLETIN OF THE SOUTH URAL STATE UNIVERSITY SERIES-MATHEMATICAL MODELLING PROGRAMMING & COMPUTER SOFTWARE, 2012, (13): : 109 - 118
  • [50] MapReduce-based storage and indexing for big health data
    Gayathiri, N. R.
    Natarajan, A. M.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (14):