Scalable Similarity Joins for Fast and Accurate Record Deduplication in Big Data

被引：0

作者：

Rozinek, Ondrej ^{[1
]}

Borkovcova, Monika ^{[2
]}

Mares, Jan ^{[1
,3
]}

机构：

[1] Department of Process Control, University of Pardubice, Studentska 95, Pardubice,532 10, Czech Republic

[2] Department of Information Technology, University of Pardubice, Studentska 95, Pardubice,532 10, Czech Republic

[3] Department of Mathematics, Informatics and Cybernetics, University of Chemistry and Technology Prague, Technicka 5, Prague,166 28, Czech Republic

来源：

Lecture Notes in Networks and Systems | 2024年 / 990 LNNS卷

关键词：

Engineering Village;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Bipartite matchings - Data-source - Deduplication - Entity resolutions - Matchings - Q-gram filters - Record deduplication - Record linkage - Similarity join - Similarity spaces

引用

页码：181 / 191

共 50 条

[21] Scalable algorithms for signal reconstruction by leveraging similarity joins
Asudeh, Abolfazl
Augustine, Jees
Nazi, Azade
Thirumuruganathan, Saravanan
Zhang, Nan
Das, Gautam
Srivastava, Divesh
VLDB JOURNAL, 2020, 29 (2-3): : 681 - 707
[22] Deduplication on Encrypted Big Data in Cloud
Yan, Zheng
Ding, Wenxiu
Yu, Xixun
Zhu, Haiqi
Deng, Robert H.
IEEE Transactions on Big Data, 2016, 2 (02): : 138 - 150
[23] Similarity based deduplication with small data chunks
Aronovich, L.
Asher, R.
Harnik, D.
Hirsch, M.
Klein, S. T.
Toaff, Y.
DISCRETE APPLIED MATHEMATICS, 2016, 212 : 10 - 22
[24] Similarity Based Deduplication with Small Data Chunks
Aronovich, Lior
Asher, Ron
Harnik, Danny
Hirsch, Michael
Klein, Shmuel T.
Toaff, Yair
PROCEEDINGS OF THE PRAGUE STRINGOLOGY CONFERENCE 2012, 2012, : 3 - 17
[25] Data Distribution for Fast Joins
Libkin, Leonid
COMMUNICATIONS OF THE ACM, 2017, 60 (03) : 92 - 92
[26] Fast and Accurate Estimates of Divergence Times from Big Data
Mello, Beatriz
Tao, Qiqing
Tamura, Koichiro
Kumar, Sudhir
MOLECULAR BIOLOGY AND EVOLUTION, 2017, 34 (01) : 45 - 50
[27] MR-SimLab: Scalable subgraph selection with label similarity for big data
Dhifli, Wajdi
Aridhi, Sabeur
Nguifo, Engelbert Mephu
INFORMATION SYSTEMS, 2017, 69 : 155 - 163
[28] Fast, scalable and geo-distributed PCA for big data analytics
Adnan, T. M. Tariq
Tanjim, Md Mehrab
Adnan, Muhammad Abdullah
INFORMATION SYSTEMS, 2021, 98 (98)
[29] Fast and Scalable Big Data Trajectory Clustering for Understanding Urban Mobility
Kumar, Dheeraj
Wu, Huayu
Rajasegarar, Sutharshan
Leckie, Christopher
Krishnaswamy, Shonali
Palaniswami, Marimuthu
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (11) : 3709 - 3722
[30] BIGMiner: a fast and scalable distributed frequent pattern miner for big data
Chon, Kang-Wook
Kim, Min-Soo
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2018, 21 (03): : 1507 - 1520

← 1 2 3 4 5 →