An Efficient Algorithm for Computing Entropic Measures of Feature Subsets

被引:0
|
作者
Pennerath, Frederic [1 ,2 ]
机构
[1] Univ Lorraine, LORIA, CNRS, Cent Supelec, F-57000 Metz, France
[2] Univ Paris Saclay, LORIA, CNRS, Cent Supelec, F-57000 Metz, France
关键词
Pattern mining; Entropic measures; Algorithm efficiency; Approximate functional dependency; Pattern redundancy;
D O I
10.1007/978-3-030-10928-8_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entropic measures such as conditional entropy or mutual information have been used numerous times in pattern mining, for instance to characterize valuable itemsets or approximate functional dependencies. Strangely enough the fundamental problem of designing efficient algorithms to compute entropy of subsets of features (or mutual information of feature subsets relatively to some target feature) has received little attention compared to the analog problem of computing frequency of itemsets. The present article proposes to fill this gap: it introduces a fast and scalable method that computes entropy and mutual information for a large number of feature subsets by adopting the divide and conquer strategy used by FP-growth - one of the most efficient frequent itemset mining algorithm. In order to illustrate its practical interest, the algorithm is then used to solve the recently introduced problem of mining reliable approximate functional dependencies. It finally provides empirical evidences that in the context of non-redundant pattern extraction, the proposed method outperforms existing algorithms for both speed and scalability. Code related to this chapter is available at: https://github.com/P-Fred/HFP-Growth.
引用
收藏
页码:483 / 499
页数:17
相关论文
共 50 条
  • [1] An Efficient Algorithm for Computing Multi-scale Connectivity Measures
    Ouzounis, Georgios K.
    MATHEMATICAL MORPHOLOGY AND ITS APPLICATION TO SIGNAL AND IMAGE PROCESSING, 2009, 5720 : 307 - 319
  • [2] Improving statistical measures of feature subsets by conventional and evolutionary approaches
    Mayer, HA
    Somol, P
    Huber, R
    Pudil, P
    ADVANCES IN PATTERN RECOGNITION, 2000, 1876 : 77 - 86
  • [3] Optimization of Text Feature Subsets Based on GATS Algorithm
    Jiang Pei-pei
    Liu Pei-yu
    Zhu Zhen-fang
    Zhao Li-na
    2009 IEEE INTERNATIONAL SYMPOSIUM ON IT IN MEDICINE & EDUCATION, VOLS 1 AND 2, PROCEEDINGS, 2009, : 924 - 927
  • [4] A Feature Selection Algorithm to Find Optimal Feature Subsets for Detecting DoS Attacks
    Kang, Seung-Ho
    2015 5TH INTERNATIONAL CONFERENCE ON IT CONVERGENCE AND SECURITY (ICITCS), 2015,
  • [5] An Efficient Algorithm to Compute Subsets of Points in Zn
    Pacheco, Ana
    Real, Pedro
    COMPUTATIONAL TOPOLOGY IN IMAGE CONTEXT (CTIC), 2012, 7309 : 58 - 67
  • [6] Φ-Entropic Measures of Correlation
    Beigi, Salman
    Gohari, Amin
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (04) : 2193 - 2211
  • [7] On a feature extraction by LMCUH algorithm for a ubiquitous computing
    Kim, Jin Ok
    Jang, Jun Yeong
    Chung, Chin Hyun
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 1, 2006, 3980 : 964 - 973
  • [8] On greedy heuristics for computing D-efficient saturated subsets
    Harman, Radoslav
    Rosa, Samuel
    OPERATIONS RESEARCH LETTERS, 2020, 48 (02) : 122 - 129
  • [9] EFFICIENT METHODS FOR COMPUTING FLEXIBLE SIMILARITY MEASURES
    WEININGER, D
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1990, 200 : 15 - COMP
  • [10] ANALYSIS OF AN ALGORITHM FOR COMPUTING INVARIANT-MEASURES
    DIAMOND, P
    KLOEDEN, P
    POKROVSKII, A
    NONLINEAR ANALYSIS-THEORY METHODS & APPLICATIONS, 1995, 24 (03) : 323 - 336