A Statistical Perspective on Discovering Functional Dependencies in Noisy Data

被引:13
|
作者
Zhang, Yunjia [1 ]
Guo, Zhihan [1 ]
Rekatsinas, Theodoros [1 ]
机构
[1] UW Madison, Madison, WI 53706 USA
关键词
Functional Dependencies; Structure Learning; COVARIANCE ESTIMATION; NETWORKS; MODELS;
D O I
10.1145/3318464.3389749
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of discovering functional dependencies (FD) from a noisy data set. We adopt a statistical perspective and draw connections between FD discovery and structure learning in probabilistic graphical models. We show that discovering FDs from a noisy data set is equivalent to learning the structure of a model over binary random variables, where each random variable corresponds to a functional of the data set attributes. We build upon this observation to introduce FDX a conceptually simple framework in which learning functional dependencies corresponds to solving a sparse regression problem. We show that FDX can recover true functional dependencies across a diverse array of real-world and synthetic data sets, even in the presence of noisy or missing data. We find that FDX scales to large data instances with millions of tuples and hundreds of attributes while it yields an average F-1 improvement of 2x against state-of-the-art FD discovery methods.
引用
收藏
页码:861 / 876
页数:16
相关论文
共 50 条
  • [1] Discovering Quantitative Temporal Functional Dependencies on Clinical Data
    Combi, Carlo
    Mantovani, Matteo
    Sala, Pietro
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 248 - 257
  • [2] Discovering Conditional Functional Dependencies
    Fan, Wenfei
    Geerts, Floris
    Li, Jianzhong
    Xiong, Ming
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (05) : 683 - 698
  • [3] Discovering Conditional Functional Dependencies
    Fan, Wenfei
    Geerts, Floris
    Lakshmanan, Laks V. S.
    Xiong, Ming
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1231 - +
  • [4] Discovering Graph Functional Dependencies
    Fan, Wenfei
    Hu, Chunming
    Liu, Xueli
    Lu, Ping
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2020, 45 (03):
  • [5] Discovering Graph Functional Dependencies
    Fan, Wenfei
    Hu, Chunming
    Liu, Xueli
    Lu, Ping
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 427 - 439
  • [6] Discovering Functional Dependencies from Mixed-Type Data
    Mandros, Panagiotis
    Kaltenpoth, David
    Boley, Mario
    Vreeken, Jilles
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1404 - 1414
  • [7] Discovering Reliable Approximate Functional Dependencies
    Mandros, Panagiotis
    Boley, Mario
    Vreeken, Jilles
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 355 - 363
  • [8] Preprocessing noisy functional data: A multivariate perspective
    Hoermann, Siegfried
    Jammoul, Fatima
    [J]. ELECTRONIC JOURNAL OF STATISTICS, 2022, 16 (02): : 6232 - 6266
  • [9] Using transversals for discovering XML functional dependencies
    Trinh, Thu
    [J]. FOUNDATIONS OF INFORMATION AND KNOWLEDGE SYSTEMS, PROCEEDINGS, 2008, 4932 : 199 - 218
  • [10] Discovering functional evolutionary dependencies in human cancers
    Marco Mina
    Arvind Iyer
    Daniele Tavernari
    Franck Raynaud
    Giovanni Ciriello
    [J]. Nature Genetics, 2020, 52 : 1198 - 1207