An Efficient Algorithm for Computing Entropic Measures of Feature Subsets

被引:0
|
作者
Pennerath, Frederic [1 ,2 ]
机构
[1] Univ Lorraine, LORIA, CNRS, Cent Supelec, F-57000 Metz, France
[2] Univ Paris Saclay, LORIA, CNRS, Cent Supelec, F-57000 Metz, France
关键词
Pattern mining; Entropic measures; Algorithm efficiency; Approximate functional dependency; Pattern redundancy;
D O I
10.1007/978-3-030-10928-8_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Entropic measures such as conditional entropy or mutual information have been used numerous times in pattern mining, for instance to characterize valuable itemsets or approximate functional dependencies. Strangely enough the fundamental problem of designing efficient algorithms to compute entropy of subsets of features (or mutual information of feature subsets relatively to some target feature) has received little attention compared to the analog problem of computing frequency of itemsets. The present article proposes to fill this gap: it introduces a fast and scalable method that computes entropy and mutual information for a large number of feature subsets by adopting the divide and conquer strategy used by FP-growth - one of the most efficient frequent itemset mining algorithm. In order to illustrate its practical interest, the algorithm is then used to solve the recently introduced problem of mining reliable approximate functional dependencies. It finally provides empirical evidences that in the context of non-redundant pattern extraction, the proposed method outperforms existing algorithms for both speed and scalability. Code related to this chapter is available at: https://github.com/P-Fred/HFP-Growth.
引用
收藏
页码:483 / 499
页数:17
相关论文
共 50 条
  • [31] An efficient algorithm and its parallelization for computing PageRank
    Qiao, Jonathan
    Jones, Brittany
    Thrall, Stacy
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 1, PROCEEDINGS, 2007, 4487 : 237 - +
  • [32] EFFICIENT ALGORITHM FOR COMPUTING FREE DISTANCE - COMMENTS
    LARSEN, KJ
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1973, 19 (04) : 577 - 579
  • [33] Efficient Algorithm for Computing Inverse of Parametric Matrices
    Dehghani Darmian, Mahdi
    SCIENTIFIC ANNALS OF COMPUTER SCIENCE, 2024, 34 (01) : 1 - 22
  • [34] An Efficient Module Deployment Algorithm in Edge Computing
    Sheu, Jang-Ping
    Pu, Yi-Cian
    Jagadeesha, R. B.
    Chang, Yeh-Cheng
    2018 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE WORKSHOPS (WCNCW), 2018, : 208 - 213
  • [35] An Efficient Quick Algorithm for Computing Stable Skeletons
    Yang, Xiaojun
    Bai, Xiang
    Yang, Xingwei
    Zeng, Luan
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 654 - +
  • [36] An Efficient Algorithm for Computing the HHSVM and Its Generalizations
    Yang, Yi
    Zou, Hui
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2013, 22 (02) : 396 - 415
  • [37] An efficient algorithm for computing the ith letter of ρn(a)
    Shallit, J
    Swart, D
    PROCEEDINGS OF THE TENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 1999, : 768 - 775
  • [38] A Simple and Efficient Algorithm for Computing Market Equilibria
    Fleischer, Lisa
    Garg, Rahul
    Kapoor, Sanjiv
    Khandekar, Rohit
    Saberi, Amin
    ACM TRANSACTIONS ON ALGORITHMS, 2016, 12 (03)
  • [39] AN EFFICIENT ALGORITHM FOR COMPUTING DERIVATIVES AND EXTREMAL PROBLEMS
    KIM, KV
    NESTEROV, IE
    SKOKOV, VA
    CHERKASSKII, BV
    MATEKON, 1985, 21 (02): : 49 - 67
  • [40] AN EFFICIENT ALGORITHM FOR COMPUTING GLOBAL RELIABILITY OF A NETWORK
    JAIN, SP
    GOPAL, K
    IEEE TRANSACTIONS ON RELIABILITY, 1988, 37 (05) : 488 - 492