Evaluation of Duplicate Detection Algorithms: From Quality Measures to Test Data Generation

被引:4
|
作者
Panse, Fabian [1 ]
Naumann, Felix [2 ]
机构
[1] Univ Hamburg, Hamburg, Germany
[2] Univ Potsdam, Hasso Plattner Inst, Potsdam, Germany
关键词
RECORD LINKAGE;
D O I
10.1109/ICDE51399.2021.00269
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Duplicate detection identifies multiple records in a dataset that represent the same real-world object. Many such approaches exist, both in research and in industry. To investigate essential properties of duplicate detection algorithms, such as their result quality or runtime behavior, they must be executed on suitable test data. The quality evaluation requires that these test data are labeled, constituting a ground truth. Correctly labeled, sizable, and real or at least realistic test datasets, however, are not easy to obtain, creating an obstacle for the advancement of research. In this tutorial, we present common methods to evaluate duplicate detection algorithms and to generate labeled test data. We close with a discussion of open problems.
引用
收藏
页码:2373 / 2376
页数:4
相关论文
共 50 条
  • [1] Regression Algorithms in Hyperspectral Data Analysis for Meat Quality Detection and Evaluation
    Pan, Ting-Tiao
    Sun, Da-Wen
    Cheng, Jun-Hu
    Pu, Hongbin
    COMPREHENSIVE REVIEWS IN FOOD SCIENCE AND FOOD SAFETY, 2016, 15 (03): : 529 - 541
  • [2] Genetic algorithms for dynamic test data generation
    Michael, CC
    McGraw, GE
    Schatz, MA
    Walton, CC
    AUTOMATED SOFTWARE ENGINEERING, 12TH IEEE INTERNATIONAL CONFERENCE, PROCEEDINGS, 1997, : 307 - 308
  • [3] Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data
    Cho, Hyeongmin
    Lee, Sangkyun
    APPLIED SCIENCES-BASEL, 2021, 11 (02): : 1 - 17
  • [4] TDG4Crowd:Test Data Generation for Evaluation of Aggregation Algorithms in Crowdsourcing
    Fang, Yili
    Shen, Chaojie
    Gu, Huamao
    Han, Tao
    Ding, Xinyi
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 2984 - 2992
  • [5] Building test data from real outbreaks for evaluating detection algorithms
    Texier, Gaetan
    Jackson, Michael L.
    Siwe, Leonel
    Meynard, Jean-Baptiste
    Deparis, Xavier
    Chaudet, Herve
    PLOS ONE, 2017, 12 (09):
  • [6] Test-data generation using genetic algorithms
    Pargas, Roy P.
    Harrold, Mary Jean
    Peck, Robert R.
    Software Testing Verification and Reliability, 1999, 9 (04): : 263 - 282
  • [7] EXPERIMENTAL EVALUATION OF TESTABILITY MEASURES FOR TEST-GENERATION
    CHANDRA, SJ
    PATEL, JH
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1989, 8 (01) : 93 - 97
  • [8] An Empirical Evaluation of Evolutionary Algorithms for Test Suite Generation
    Campos, Jose
    Ge, Yan
    Fraser, Gordon
    Eler, Marcelo
    Arcuri, Andrea
    SEARCH BASED SOFTWARE ENGINEERING, SSBSE 2017, 2017, 10452 : 33 - 48
  • [9] USING DATA QUALITY MEASURES IN DECISION-MAKING ALGORITHMS
    DILLARD, RA
    IEEE EXPERT-INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1992, 7 (06): : 63 - 72
  • [10] Improving Eye-Tracking Data Quality: A Framework for Reproducible Evaluation of Detection Algorithms
    Gundler, Christopher
    Temmen, Matthias
    Gulberti, Alessandro
    Poetter-Nerger, Monika
    Ueckert, Frank
    SENSORS, 2024, 24 (09)