An Efficient Algorithm for Computing Entropic Measures of Feature Subsets

被引:0
|
作者
Pennerath, Frederic [1 ,2 ]
机构
[1] Univ Lorraine, LORIA, CNRS, Cent Supelec, F-57000 Metz, France
[2] Univ Paris Saclay, LORIA, CNRS, Cent Supelec, F-57000 Metz, France
关键词
Pattern mining; Entropic measures; Algorithm efficiency; Approximate functional dependency; Pattern redundancy;
D O I
10.1007/978-3-030-10928-8_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entropic measures such as conditional entropy or mutual information have been used numerous times in pattern mining, for instance to characterize valuable itemsets or approximate functional dependencies. Strangely enough the fundamental problem of designing efficient algorithms to compute entropy of subsets of features (or mutual information of feature subsets relatively to some target feature) has received little attention compared to the analog problem of computing frequency of itemsets. The present article proposes to fill this gap: it introduces a fast and scalable method that computes entropy and mutual information for a large number of feature subsets by adopting the divide and conquer strategy used by FP-growth - one of the most efficient frequent itemset mining algorithm. In order to illustrate its practical interest, the algorithm is then used to solve the recently introduced problem of mining reliable approximate functional dependencies. It finally provides empirical evidences that in the context of non-redundant pattern extraction, the proposed method outperforms existing algorithms for both speed and scalability. Code related to this chapter is available at: https://github.com/P-Fred/HFP-Growth.
引用
收藏
页码:483 / 499
页数:17
相关论文
共 50 条
  • [41] An efficient distributed algorithm for computing association rules
    Li, YJ
    Lin, XM
    Tsang, CP
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2000, 1846 : 109 - 120
  • [42] An Efficient Iterative Algorithm to Explainable Feature Learning
    Vlahek, Dino
    Informatica (Slovenia), 2024, 48 (02): : 289 - 290
  • [43] An efficient feature selection algorithm for hybrid data
    Wang, Feng
    Liang, Jiye
    NEUROCOMPUTING, 2016, 193 : 33 - 41
  • [44] Combining feature subsets in feature selection
    Skurichina, M
    Duin, RPW
    MULTIPLE CLASSIFIER SYSTEMS, 2005, 3541 : 165 - 175
  • [45] An Efficient Marine Predators Algorithm for Feature Selection
    Abd Elminaam, Diaa Salama
    Nabil, Ayman
    Ibraheem, Shimaa A.
    Houssein, Essam H.
    IEEE ACCESS, 2021, 9 : 60136 - 60153
  • [46] An efficient Algorithm for fingerprint preprocessing and feature extraction
    Gnanasivam, P.
    Muttan, S.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE AND EXHIBITION ON BIOMETRICS TECHNOLOGY, 2010, 2 : 133 - 142
  • [47] Efficient Quantum Algorithm for Similarity Measures for Molecules
    Yang, Li-Ping
    Lu, Song-Feng
    Li, Li
    INTERNATIONAL JOURNAL OF THEORETICAL PHYSICS, 2018, 57 (09) : 2854 - 2862
  • [48] An Efficient Algorithm for Identification of Real Belief Measures
    Chen, Wei
    Cao, Kajia
    Jia, Renan
    Chen, Kuiliang
    2009 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING ( GRC 2009), 2009, : 83 - +
  • [49] Efficient Quantum Algorithm for Similarity Measures for Molecules
    Li-Ping Yang
    Song-Feng Lu
    Li Li
    International Journal of Theoretical Physics, 2018, 57 : 2854 - 2862
  • [50] On the Correlation Measures of Subsets
    Liu, Huaning
    Mauduit, Christian
    ANNALS OF COMBINATORICS, 2020, 24 (02) : 311 - 336