An Efficient and Robust Approach for Discovering Data Quality Rules

被引:11
|
作者
Yeh, Peter Z. [1 ]
Puri, Colin A. [1 ]
机构
[1] Accenture Technol Labs, San Jose, CA USA
关键词
D O I
10.1109/ICTAI.2010.43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Poor quality data is a growing problem that affects many enterprises across all aspects of their business ranging from operational efficiency to revenue protection. Moreover, this problem is costly to fix because significant effort and resources are required to identify a comprehensive set of rules that can detect (and correct) data defects along various data quality dimensions such as consistency, conformity, and more. Hence, many organizations employ only basic data quality rules that check for null values, format, etc. in efforts such as data profiling and data cleansing; and ignore rules that are needed to detect deeper problems such as inconsistent values across interdependent attributes. This oversight can lead to numerous problems such as inaccurate reporting of key metrics used to inform critical decisions or derive business insights. In this paper, we present an approach that efficiently and robustly discovers data quality rules - in particular conditional functional dependencies - for detecting inconsistencies in data and hence improves data quality along the critical dimension of consistency. We evaluate our approach empirically on several real-world data sets. We show that our approach performs well on these data sets for metrics such as precision and recall. We also compare our approach to an established solution and show that our approach outperforms this solution for the same metrics. Finally, we show that our approach scales efficiently with the number of records, the number of attributes, and the domain size.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Discovering Data Quality Rules
    Chiang, Fei
    Miller, Renee J.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 1166 - 1177
  • [2] An efficient data mining technique for discovering interesting association rules
    Yen, SJ
    Chen, ALP
    EIGHTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1997, : 664 - 669
  • [3] Discovering associations in spatial data - An efficient medoid based approach
    Estivill-Castro, V
    Murray, AT
    RESEARCH AND DEVELOPMENT IN KNOWLEDGE DISCOVERY AND DATA MINING, 1998, 1394 : 110 - 121
  • [4] An association based approach to discovering ordering rules
    Liu, Da-Zhong
    Gao, Yuan
    Zhao, Jian-Ping
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 202 - 205
  • [5] What Are the Rules? Discovering Constraints from Data
    Wiegand, Boris
    Klakow, Dietrich
    Vreeken, Jilles
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 8, 2024, : 8182 - 8190
  • [6] Discovering interesting rules from financial data
    Soldacki, P
    Protaziuk, G
    INTELLIGENT INFORMATION SYSTEMS 2002, PROCEEDINGS, 2002, 17 : 109 - 119
  • [7] Discovering dispatching rules using data mining
    Li, XN
    Olafsson, S
    JOURNAL OF SCHEDULING, 2005, 8 (06) : 515 - 527
  • [8] Discovering Dialog Rules by means of an Evolutionary Approach
    Griol, David
    Callejas, Zoraida
    INTERSPEECH 2019, 2019, : 1473 - 1477
  • [9] Discovering validation rules from microbiological data
    Lamma, E
    Riguzzi, F
    Storari, S
    Mello, P
    Nanetti, A
    NEW GENERATION COMPUTING, 2003, 21 (02) : 123 - 133
  • [10] Discovering Dispatching Rules Using Data Mining
    Xiaonan Li
    Sigurdur Olafsson
    Journal of Scheduling, 2005, 8 : 515 - 527