Discovering Functional Dependencies from Mixed-Type Data

被引:2
|
作者
Mandros, Panagiotis [1 ]
Kaltenpoth, David [2 ]
Boley, Mario [3 ]
Vreeken, Jilles [2 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] CISPA Helmholtz Ctr Informat Secur, Saarbrucken, Germany
[3] Monash Univ, Melbourne, Vic, Australia
关键词
mutual information; functional dependency discovery; mixed data; MUTUAL INFORMATION; ENTROPY;
D O I
10.1145/3394486.3403193
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given complex data collections, practitioners can perform non-parametric functional dependency discovery (FDD) to uncover relationships between variables that were previously unknown. However, known FDD methods are applicable to nominal data, and in practice non-nominal variables are discretized, e.g., in a pre-processing step. This is problematic because, as soon as a mix of discrete and continuous variables is involved, the interaction of discretization with the various dependency measures from the literature is poorly understood. In particular, it is unclear whether a given discretization method even leads to a consistent dependency estimate. In this paper, we analyze these fundamental questions and derive formal criteria as to when a discretization process applied to a mixed set of random variables leads to consistent estimates of mutual information. With these insights, we derive an estimator framework applicable to any task that involves estimating mutual information from multivariate and mixed-type data. Last, we extend with this framework a previously proposed FDD approach for reliable dependencies. Experimental evaluation shows that the derived reliable estimator is both computationally and statistically efficient, and leads to effective FDD algorithms for mixed-type data.
引用
收藏
页码:1404 / 1414
页数:11
相关论文
共 50 条
  • [1] Bayesian inference of graph-based dependencies from mixed-type data
    Galimberti, Chiara
    Peluso, Stefano
    Castelletti, Federico
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2024, 203
  • [2] Discovering Quantitative Temporal Functional Dependencies on Clinical Data
    Combi, Carlo
    Mantovani, Matteo
    Sala, Pietro
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 248 - 257
  • [3] A Statistical Perspective on Discovering Functional Dependencies in Noisy Data
    Zhang, Yunjia
    Guo, Zhihan
    Rekatsinas, Theodoros
    [J]. SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 861 - 876
  • [4] Discovering Anomalies on Mixed-Type Data Using a Generalized Student-t Based Approach
    Lu, Yen-Cheng
    Chen, Feng
    Wang, Yating
    Lu, Chang-Tien
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (10) : 2582 - 2595
  • [5] Spectral Clustering of Mixed-Type Data
    Mbuga, Felix
    Tortora, Cristina
    [J]. STATS, 2022, 5 (01): : 1 - 11
  • [6] Discovering Conditional Functional Dependencies
    Fan, Wenfei
    Geerts, Floris
    Lakshmanan, Laks V. S.
    Xiong, Ming
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1231 - +
  • [7] Discovering Conditional Functional Dependencies
    Fan, Wenfei
    Geerts, Floris
    Li, Jianzhong
    Xiong, Ming
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (05) : 683 - 698
  • [8] Discovering Graph Functional Dependencies
    Fan, Wenfei
    Hu, Chunming
    Liu, Xueli
    Lu, Ping
    [J]. ACM TRANSACTIONS ON DATABASE SYSTEMS, 2020, 45 (03):
  • [9] Discovering Graph Functional Dependencies
    Fan, Wenfei
    Hu, Chunming
    Liu, Xueli
    Lu, Ping
    [J]. SIGMOD'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2018, : 427 - 439
  • [10] Parallel approaches for discovering functional dependencies from data for information system design recovery
    Lim, WM
    Harrison, J
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS, PROCEEDINGS (I-SPAN '97), 1997, : 254 - 260