Discovering Functional Dependencies from Mixed-Type Data

被引:2
|
作者
Mandros, Panagiotis [1 ]
Kaltenpoth, David [2 ]
Boley, Mario [3 ]
Vreeken, Jilles [2 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] CISPA Helmholtz Ctr Informat Secur, Saarbrucken, Germany
[3] Monash Univ, Melbourne, Vic, Australia
关键词
mutual information; functional dependency discovery; mixed data; MUTUAL INFORMATION; ENTROPY;
D O I
10.1145/3394486.3403193
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given complex data collections, practitioners can perform non-parametric functional dependency discovery (FDD) to uncover relationships between variables that were previously unknown. However, known FDD methods are applicable to nominal data, and in practice non-nominal variables are discretized, e.g., in a pre-processing step. This is problematic because, as soon as a mix of discrete and continuous variables is involved, the interaction of discretization with the various dependency measures from the literature is poorly understood. In particular, it is unclear whether a given discretization method even leads to a consistent dependency estimate. In this paper, we analyze these fundamental questions and derive formal criteria as to when a discretization process applied to a mixed set of random variables leads to consistent estimates of mutual information. With these insights, we derive an estimator framework applicable to any task that involves estimating mutual information from multivariate and mixed-type data. Last, we extend with this framework a previously proposed FDD approach for reliable dependencies. Experimental evaluation shows that the derived reliable estimator is both computationally and statistically efficient, and leads to effective FDD algorithms for mixed-type data.
引用
收藏
页码:1404 / 1414
页数:11
相关论文
共 50 条
  • [21] Diagnostic Test for Realized Missingness in Mixed-type Data
    Chen, Ruizhe
    Chung, Yu-Che
    Basu, Sanjib
    Shi, Qian
    [J]. SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 2024, 86 (01): : 109 - 138
  • [22] Multivariate semiparametric control charts for mixed-type data
    Sofikitou, Elisavet M.
    Markatou, Marianthi
    Koutras, Markos, V
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (04) : 671 - 690
  • [23] Clustering large mixed-type data with ordinal variables
    Szepannek, Gero
    Aschenbruck, Rabea
    Wilhelm, Adalbert
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024,
  • [24] The quick dynamic clustering method for mixed-type data
    Ayuyev, V. V.
    Thura, A.
    Hlaing, N. N.
    Loginova, M. B.
    [J]. AUTOMATION AND REMOTE CONTROL, 2012, 73 (12) : 2083 - 2088
  • [25] The quick dynamic clustering method for mixed-type data
    V. V. Ayuyev
    A. Thura
    N. N. Hlaing
    M. B. Loginova
    [J]. Automation and Remote Control, 2012, 73 : 2083 - 2088
  • [26] Diagnostic Test for Realized Missingness in Mixed-type Data
    Ruizhe Chen
    Yu-Che Chung
    Sanjib Basu
    Qian Shi
    [J]. Sankhya B, 2024, 86 : 109 - 138
  • [27] Clustering of Mixed-Type Data Considering Concept Hierarchies
    Behzadi, Sahar
    Mueller, Nikola S.
    Plant, Claudia
    Boehm, Christian
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 555 - 573
  • [28] Missing-Values Adjustment for Mixed-Type Data
    Tarsitano, Agostino
    Falcone, Marianna
    [J]. JOURNAL OF PROBABILITY AND STATISTICS, 2011, 2011
  • [29] Clusterwise multivariate regression of mixed-type panel data
    Vavra, Jan
    Komarek, Arnost
    Gruen, Bettina
    Malsiner-Walli, Gertraud
    [J]. STATISTICS AND COMPUTING, 2024, 34 (01)
  • [30] GLOBAL SOLUTIONS TO MIXED-TYPE NONLINEAR FUNCTIONAL DIFFERENTIAL EQUATIONS
    Diblik, Josef
    Vazanova, Gabriela
    [J]. MATHEMATICS, INFORMATION TECHNOLOGIES AND APPLIED SCIENCES 2018, 2018, : 44 - 54