Scalable Similarity Joins for Fast and Accurate Record Deduplication in Big Data

被引：0

作者：

Rozinek, Ondrej ^{[1
]}

Borkovcova, Monika ^{[2
]}

Mares, Jan ^{[1
,3
]}

机构：

[1] Department of Process Control, University of Pardubice, Studentska 95, Pardubice,532 10, Czech Republic

[2] Department of Information Technology, University of Pardubice, Studentska 95, Pardubice,532 10, Czech Republic

[3] Department of Mathematics, Informatics and Cybernetics, University of Chemistry and Technology Prague, Technicka 5, Prague,166 28, Czech Republic

来源：

Lecture Notes in Networks and Systems | 2024年 / 990 LNNS卷

关键词：

Engineering Village;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Bipartite matchings - Data-source - Deduplication - Entity resolutions - Matchings - Q-gram filters - Record deduplication - Record linkage - Similarity join - Similarity spaces

引用

页码：181 / 191

共 50 条

[31] BIGMiner: a fast and scalable distributed frequent pattern miner for big data
Kang-Wook Chon
Min-Soo Kim
Cluster Computing, 2018, 21 : 1507 - 1520
[32] MassJoin: A MapReduce-based Method for Scalable String Similarity Joins
Deng, Dong
Li, Guoliang
Hao, Shuang
Wang, Jiannan
Feng, Jianhua
2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 340 - 351
[33] On evaluating text similarity measures for customer data deduplication
Boinski, Pawel
Sienkiewicz, Mariusz
Wrembel, Robert
Bebel, Bartosz
Andrzejewski, Witold
38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 297 - 300
[34] Characterizing the Efficiency of Data Deduplication for Big Data Storage Management
Zhou, Ruijin
Liu, Ming
Li, Tao
2013 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2013), 2013, : 98 - 108
[35] A Bloom Filter-Based Data Deduplication for Big Data
Podder, Shrayasi
Mukherjee, S.
ADVANCES IN DATA AND INFORMATION SCIENCES, VOL 1, 2018, 38 : 161 - 168
[36] Entity deduplication in big data graphs for scholarly communication
Manghi, Paolo
Atzori, Claudio
De Bonis, Michele
Bardi, Alessia
DATA TECHNOLOGIES AND APPLICATIONS, 2020, 54 (04) : 409 - 435
[37] Boafft: Distributed Deduplication for Big Data Storage in the Cloud
Luo, Shengmei
Zhang, Guangyan
Wu, Chengwen
Khan, Samee U.
Li, Keqin
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (04) : 1199 - 1211
[38] A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud
Yang, Chi
Chen, Jinjun
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (06) : 1144 - 1157
[39] Prefix Tree Indexing for Similarity Search and Similarity Joins on Genomic Data
Rheinlaender, Astrid
Knobloch, Martin
Hochmuth, Nicky
Leser, Ulf
SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, 2010, 6187 : 519 - 536
[40] Intelligent Similary Joins for Big Data Integration
Wang, Mian
Nie, Tiezheng
Shen, Derong
Kou, Yue
Yu, Ge
2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 383 - 388

← 1 2 3 4 5 →