Algorithms for the discovery of embedded functional dependencies

被引:6
|
作者
Wei, Ziheng [1 ]
Hartmann, Sven [2 ]
Link, Sebastian [1 ]
机构
[1] Univ Auckland, Sch Comp Sci, Auckland, New Zealand
[2] Tech Univ Clausthal, Inst Informat, Clausthal Zellerfeld, Germany
来源
VLDB JOURNAL | 2021年 / 30卷 / 06期
关键词
Algorithm; Armstrong sample; Completeness requirement; Data redundancy; Discovery; Integrity requirement; Intractability; Missing data; Functional Dependency; EFFICIENT DISCOVERY;
D O I
10.1007/s00778-021-00684-3
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Embedded functional dependencies (eFDs) advance data management applications by data completeness and integrity requirements. We show that the discovery problem of eFDs is NP-complete in the output, and has a minimum solution space that is larger than the maximum solution space for functional dependencies. Nevertheless, we use novel data structures and search strategies to develop row-efficient, column-efficient, and hybrid algorithms for eFD discovery. Our experiments demonstrate that the algorithms scale well in terms of their design targets, and that ranking the eFDs by the number of redundant data values they cause can provide useful guidance in identifying meaningful eFDs for applications. We further demonstrate the benefits of introducing completeness requirements and ranking by the number of redundant data values for other variants of functional dependencies. Finally, we show how to compute informative Armstrong samples and illustrate the performance of our algorithms on the benchmark data. The informative Armstrong samples can be used to find eFDs that are meaningful for the application domain but violated by a given data set due to inconsistencies.
引用
收藏
页码:1069 / 1093
页数:25
相关论文
共 50 条
  • [1] Algorithms for the discovery of embedded functional dependencies
    Ziheng Wei
    Sven Hartmann
    Sebastian Link
    [J]. The VLDB Journal, 2021, 30 : 1069 - 1093
  • [2] Discovery Algorithms for Embedded Functional Dependencies
    Wei, Ziheng
    Hartmann, Sven
    Link, Sebastian
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 833 - 843
  • [3] Discovery of Field Functional Dependencies
    Sun, Jizhou
    Li, Jianzhong
    Gao, Hong
    Liu, Xianmin
    [J]. 2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND KNOWLEDGE ENGINEERING (ISKE), 2015, : 448 - 455
  • [4] Discovery and Ranking of Functional Dependencies
    Wei, Ziheng
    Link, Sebastian
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1526 - 1537
  • [5] Distributed Discovery of Functional Dependencies
    Saxena, Hemant
    Golab, Lukasz
    Ilyas, Ihab F.
    [J]. 2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1590 - 1593
  • [6] Discovery of Temporal Graph Functional Dependencies
    Noronha, Levin
    Chiang, Fei
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3348 - 3352
  • [7] On discovery of functional dependencies from data
    Liu, Jixue
    Ye, Feiyue
    Li, Jiuyong
    Wang, Junhu
    [J]. DATA & KNOWLEDGE ENGINEERING, 2013, 86 : 146 - 159
  • [8] Incremental Discovery of Imprecise Functional Dependencies
    Caruccio, Loredana
    Cirillo, Stefano
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2020, 12 (04):
  • [9] Efficient Discovery of Ontology Functional Dependencies
    Baskaran, Sridevi
    Keller, Alexander
    Chiang, Fei
    Golab, Lukasz
    Szlichta, Jaroslaw
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1847 - 1856
  • [10] A CLAIM TO INCORPORATE FUNCTIONAL DEPENDENCIES IN DEVELOPMENT TOOLS Benchmarking and Checking Functional Dependencies Algorithms
    Enciso Garcia-Oliveros, Manuel
    Mora Bonilla, Angel
    Cordero, Pablo
    Baena, Rosario
    [J]. ICSOFT 2011: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SOFTWARE AND DATABASE TECHNOLOGIES, VOL 1, 2011, : 313 - 316