Scalable Similarity Joins for Fast and Accurate Record Deduplication in Big Data

被引:0
|
作者
Rozinek, Ondrej [1 ]
Borkovcova, Monika [2 ]
Mares, Jan [1 ,3 ]
机构
[1] Department of Process Control, University of Pardubice, Studentska 95, Pardubice,532 10, Czech Republic
[2] Department of Information Technology, University of Pardubice, Studentska 95, Pardubice,532 10, Czech Republic
[3] Department of Mathematics, Informatics and Cybernetics, University of Chemistry and Technology Prague, Technicka 5, Prague,166 28, Czech Republic
来源
关键词
Engineering Village;
D O I
暂无
中图分类号
学科分类号
摘要
Bipartite matchings - Data-source - Deduplication - Entity resolutions - Matchings - Q-gram filters - Record deduplication - Record linkage - Similarity join - Similarity spaces
引用
收藏
页码:181 / 191
相关论文
共 50 条
  • [1] Scalable Similarity Joins for Fast and Accurate Record Deduplication in Big Data
    Rozinek, Ondrej
    Borkovcova, Monika
    Mares, Jan
    GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 6, WORLDCIST 2024, 2024, 990 : 181 - 191
  • [2] Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics
    Rong, Chuitian
    Lin, Chunbin
    Silva, Yasin N.
    Wang, Jianguo
    Lu, Wei
    Du, Xiaoyong
    2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1059 - 1070
  • [3] Fast and scalable vector similarity joins with MapReduce
    Byoungju Yang
    Hyun Joon Kim
    Junho Shim
    Dongjoo Lee
    Sang-goo Lee
    Journal of Intelligent Information Systems, 2016, 46 : 473 - 497
  • [4] Fast and scalable vector similarity joins with MapReduce
    Yang, Byoungju
    Kim, Hyun Joon
    Shim, Junho
    Lee, Dongjoo
    Lee, Sang-goo
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (03) : 473 - 497
  • [5] Hadoop Based Scalable Cluster Deduplication for Big Data
    Liu, Qing
    Fu, Yinjin
    Ni, Guiqiang
    Hou, Rui
    2016 IEEE 36TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS (ICDCSW 2016), 2016, : 98 - 105
  • [6] Scalable Similarity Joins of Tokenized Strings
    Metwally, Ahmed
    Huang, Chun-Heng
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1766 - 1777
  • [7] Fast and scalable inequality joins
    Zuhair Khayyat
    William Lucia
    Meghna Singh
    Mourad Ouzzani
    Paolo Papotti
    Jorge-Arnulfo Quiané-Ruiz
    Nan Tang
    Panos Kalnis
    The VLDB Journal, 2017, 26 : 125 - 150
  • [8] Fast and scalable inequality joins
    Khayyat, Zuhair
    Lucia, William
    Singh, Meghna
    Ouzzani, Mourad
    Papotti, Paolo
    Quiane-Ruiz, Jorge-Arnulfo
    Tang, Nan
    Kalnis, Panos
    VLDB JOURNAL, 2017, 26 (01): : 125 - 150
  • [9] Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data
    Fang, Yixiang
    Cheng, Reynold
    Tang, Wenbin
    Maniu, Silviu
    Yang, Xuan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (03) : 785 - 800
  • [10] Scalable Algorithms for Nearest-Neighbor Joins on Big Trajectory Data
    Fang, Yixiang
    Cheng, Reynold
    Tang, Wenbin
    Maniu, Silviu
    Yang, Xuan
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1528 - 1529