MapReduce-based entity matching with multiple blocking functions

被引:0
|
作者
Cheqing Jin
Jie Chen
Huiping Liu
机构
[1] East China Normal University,Institute for Data Science and Engineering, School of Computer Science and Software Engineering
来源
关键词
entity matching; MapReduce; load balancing; pair deduplication;
D O I
暂无
中图分类号
学科分类号
摘要
Entity matching that aims at finding some records belonging to the same real-world objects has been studied for decades. In order to avoid verifying every pair of records in a massive data set, a common method, known as the blocking-based method, tends to select a small proportion of record pairs for verification with a far lower cost than O(n2), where n is the size of the data set. Furthermore, executing multiple blocking functions independently is critical since much more matching records can be found in this way, so that the quality of the query result can be improved significantly.
引用
收藏
页码:895 / 911
页数:16
相关论文
共 50 条
  • [1] MapReduce-based entity matching with multiple blocking functions
    Jin, Cheqing
    Chen, Jie
    Liu, Huiping
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2017, 11 (05) : 895 - 911
  • [2] Improving Load Balancing for MapReduce-based Entity Matching
    Mestre, Demetrio Gomes
    Santos Pires, Carlos Eduardo
    [J]. 2013 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2013,
  • [3] Efficient MapReduce-Based Method for Massive Entity Matching
    Chao, Pingfu
    Gao, Zhu
    Li, Yuming
    Fang, Junhua
    Zhang, Rong
    Zhou, Aoying
    [J]. WEB-AGE INFORMATION MANAGEMENT (WAIM 2015), 2015, 9098 : 494 - 497
  • [4] Eliminating the Redundancy in MapReduce-based Entity Resolution
    Yan, Cairong
    Song, Yalong
    Wang, Jian
    Guo, Wenjing
    [J]. 2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1233 - 1236
  • [5] Load Balancing for MapReduce-based Entity Resolution
    Kolb, Lars
    Thor, Andreas
    Rahm, Erhard
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 618 - 629
  • [6] Adaptive Sorted Neighborhood Blocking for Entity Matching with MapReduce
    Mestre, Demetrio Gomes
    Pires, Carlos Eduardo
    Nascimento, Dimas C.
    [J]. 30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 981 - 987
  • [7] A scalable MapReduce-based design of an unsupervised entity resolution system
    Hagan, Nicholas Kofi Akortia
    Talburt, John R.
    Anderson, Kris E.
    Hagan, Deasia
    [J]. FRONTIERS IN BIG DATA, 2024, 7
  • [8] An efficient MapReduce-based rule matching method for production system
    Li, Ying
    Liu, Weiwei
    Cao, Bin
    Yin, Jianwei
    Yao, Min
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 54 : 478 - 489
  • [9] MapReduce-Based Techniques For Multiple Object Tracking in Video Analytics
    Singh, Gurinderbeer
    Majumdar, Shikharesh
    Rajan, Sreeraman
    [J]. 2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
  • [10] SEMI: A Scalable Entity Matching System Based on MapReduce
    Chao, Pingfu
    Li, Yuming
    Gao, Zhu
    Fang, Junhua
    He, Xiaofeng
    Zhang, Rong
    [J]. DATABASES THEORY AND APPLICATIONS, 2015, 9093 : 328 - 332