Evaluation of Duplicate Detection Algorithms: From Quality Measures to Test Data Generation

被引：4

作者：

Panse, Fabian ^{[1
]}

Naumann, Felix ^{[2
]}

机构：

[1] Univ Hamburg, Hamburg, Germany

[2] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany

来源：

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021) | 2021年

关键词：

RECORD LINKAGE;

D O I：

10.1109/ICDE51399.2021.00269

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Duplicate detection identifies multiple records in a dataset that represent the same real-world object. Many such approaches exist, both in research and in industry. To investigate essential properties of duplicate detection algorithms, such as their result quality or runtime behavior, they must be executed on suitable test data. The quality evaluation requires that these test data are labeled, constituting a ground truth. Correctly labeled, sizable, and real or at least realistic test datasets, however, are not easy to obtain, creating an obstacle for the advancement of research. In this tutorial, we present common methods to evaluate duplicate detection algorithms and to generate labeled test data. We close with a discussion of open problems.

引用

页码：2373 / 2376

页数：4

共 50 条

[1] Regression Algorithms in Hyperspectral Data Analysis for Meat Quality Detection and Evaluation
Pan, Ting-Tiao
Sun, Da-Wen
Cheng, Jun-Hu
Pu, Hongbin
COMPREHENSIVE REVIEWS IN FOOD SCIENCE AND FOOD SAFETY, 2016, 15 (03): : 529 - 541
[2] Genetic algorithms for dynamic test data generation
Michael, CC
McGraw, GE
Schatz, MA
Walton, CC
AUTOMATED SOFTWARE ENGINEERING, 12TH IEEE INTERNATIONAL CONFERENCE, PROCEEDINGS, 1997, : 307 - 308
[3] Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data
Cho, Hyeongmin
Lee, Sangkyun
APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 17
[4] TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing
Fang, Yili
Shen, Chaojie
Gu, Huamao
Han, Tao
Ding, Xinyi
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2984 - 2992
[5] Building test data from real outbreaks for evaluating detection algorithms
Texier, Gaetan
Jackson, Michael L.
Siwe, Leonel
Meynard, Jean-Baptiste
Deparis, Xavier
Chaudet, Herve
PLOS ONE, 2017, 12 (09):
[6] Test-data generation using genetic algorithms
Pargas, Roy P.
Harrold, Mary Jean
Peck, Robert R.
Software Testing Verification and Reliability, 1999, 9 (04): : 263 - 282
[7] EXPERIMENTAL EVALUATION OF TESTABILITY MEASURES FOR TEST-GENERATION
CHANDRA, SJ
PATEL, JH
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1989, 8 (01) : 93 - 97
[8] An Empirical Evaluation of Evolutionary Algorithms for Test Suite Generation
Campos, Jose
Ge, Yan
Fraser, Gordon
Eler, Marcelo
Arcuri, Andrea
SEARCH BASED SOFTWARE ENGINEERING, SSBSE 2017, 2017, 10452 : 33 - 48
[9] USING DATA QUALITY MEASURES IN DECISION-MAKING ALGORITHMS
DILLARD, RA
IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1992, 7 (06): : 63 - 72
[10] Improving Eye-Tracking Data Quality: A Framework for Reproducible Evaluation of Detection Algorithms
Gundler, Christopher
Temmen, Matthias
Gulberti, Alessandro
Poetter-Nerger, Monika
Ueckert, Frank
SENSORS, 2024, 24 (09)

← 1 2 3 4 5 →