Discovering Conditional Functional Dependencies

被引:108
|
作者
Fan, Wenfei [1 ]
Geerts, Floris [1 ]
Li, Jianzhong [2 ]
Xiong, Ming [3 ]
机构
[1] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
[2] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
[3] Bell Labs, Murray Hill, NJ 07974 USA
基金
英国工程与自然科学研究理事会;
关键词
Integrity; conditional functional dependency; functional dependency; free item set; closed item set;
D O I
10.1109/TKDE.2010.154
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules for cleaning relational data. However, finding quality CFDs is an expensive process that involves intensive manual effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from relations. Already hard for traditional FDs, the discovery problem is more difficult for CFDs. Indeed, mining patterns in CFDs introduces new challenges. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on techniques for mining closed item sets, and is used to discover constant CFDs, namely, CFDs with constant patterns only. Constant CFDs are particularly important for object identification, which is essential to data cleaning and data integration. The other two algorithms are developed for discovering general CFDs. One algorithm, referred to as CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as FastCFD, is based on the depth-first approach used in FastFD, a method for discovering FDs. It leverages closed-item-set mining to reduce the search space. As verified by our experimental study, CFDMiner can be multiple orders of magnitude faster than CTANE and FastCFD for constant CFD discovery. CTANE works well when a given relation is large, but it does not scale well with the arity of the relation. FastCFD is far more efficient than CTANE when the arity of the relation is large; better still, leveraging optimization based on closed-item-set mining, FastCFD also scales well with the size of the relation. These algorithms provide a set of cleaning-rule discovery tools for users to choose for different applications.
引用
收藏
页码:683 / 698
页数:16
相关论文
共 50 条
  • [31] Incorporating cardinality constraints and synonym rules into conditional functional dependencies
    Chen, Wenguang
    Fan, Wenfei
    Ma, Shuai
    [J]. INFORMATION PROCESSING LETTERS, 2009, 109 (14) : 783 - 789
  • [32] On Generating Near-Optimal Tableaux for Conditional Functional Dependencies
    Golab, Lukasz
    Karloff, Howard
    Korn, Flip
    Srivastava, Divesh
    Yu, Bei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (01): : 376 - 390
  • [33] FD_Mine: Discovering functional dependencies in a database using equivalences
    Yao, H
    Hamilton, HJ
    Butz, CJ
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 729 - 732
  • [34] Discovering Relaxed Functional Dependencies Based on Multi-Attribute Dominance
    Caruccio, Loredana
    Deufemia, Vincenzo
    Naumann, Felix
    Polese, Giuseppe
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (09) : 3212 - 3228
  • [35] Discovering Relaxed Functional Dependencies based on Multi-attribute Dominance
    Caruccio, Loredana
    Deufemia, Vincenzo
    Naumann, Felix
    Polese, Giuseppe
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 2354 - 2355
  • [36] Discovering Graph Differential Dependencies
    Zhang, Yidi
    Kwashie, Selasi
    Bewong, Michael
    Hu, Junwei
    Mahboubi, Arash
    Guo, Xi
    Feng, Zaiwen
    [J]. DATABASES THEORY AND APPLICATIONS, ADC 2023, 2024, 14386 : 259 - 272
  • [37] Discovering Band Order Dependencies
    Li, Pei
    Szlichta, Jaroslaw
    Bohlen, Michael
    Srivastava, Divesh
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1878 - 1881
  • [38] Discovering dependencies in sound descriptors
    Wieczorkowska, AA
    Zytkow, JM
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2003, : 431 - 438
  • [39] Mining of constant conditional functional dependencies based on pruning free itemsets
    [J]. Diao, Xingchun (diaoxch640222@163.com), 1600, Tsinghua University (56):
  • [40] A Method for Generating Fixing Rules from Constant Conditional Functional Dependencies
    Zhou, Jinling
    Diao, Xinchun
    Cao, Jianjun
    Zhou, Xing
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE ENGINEERING AND APPLICATIONS (ICKEA 2016), 2016, : 6 - 11