Mining Non-Redundant High Order Correlations in Binary Data

被引:9
|
作者
Zhang, Xiang [1 ]
Pan, Feng [1 ]
Wang, Wei [1 ]
Nobel, Andrew [2 ]
机构
[1] Univ N Carolina, Dept Comp Sci, Chapel Hill, NC 27599 USA
[2] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2008年 / 1卷 / 01期
关键词
D O I
10.14778/1453856.1453981
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many approaches have been proposed to find correlations in binary data. Usually, these methods focus on pair-wise correlations. In biology applications, it is important to find correlations that involve more than just two features. Moreover, a set of strongly correlated features should be non-redundant in the sense that the correlation is strong only when all the interacting features are considered together. Removing any feature will greatly reduce the correlation. In this paper, we explore the problem of finding non-redundant high order correlations in binary data. The high order correlations are formalized using multi-information, a generalization of pair-wise mutual information. To reduce the redundancy, we require any subset of a strongly correlated feature subset to be weakly correlated. Such feature subsets are referred to as Non-redundant Interacting Feature Subsets (NIFS). Finding all NIFSs is computationally challenging, because in addition to enumerating feature combinations, we also need to check all their subsets for redundancy. We study several properties of NIFSs and show that these properties are useful in developing efficient algorithms. We further develop two sets of upper and lower bounds on the correlations, which can be incorporated in the algorithm to prune the search space. A simple and effective pruning strategy based on pair-wise mutual information is also developed to further prune the search space. The efficiency and effectiveness of our approach are demonstrated through extensive experiments on synthetic and real-life datasets.
引用
收藏
页码:1178 / 1188
页数:11
相关论文
共 50 条
  • [1] Mining of Multiobjective Non-redundant Association Rules in Data Streams
    Gupta, Anamika
    Kumar, Naveen
    Bhatnagar, Vasudha
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 73 - 81
  • [2] Mining Non-Redundant Association Rules
    Mohammed J. Zaki
    [J]. Data Mining and Knowledge Discovery, 2004, 9 : 223 - 248
  • [3] Mining non-redundant association rules
    Zaki, MJ
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 9 (03) : 223 - 248
  • [4] Mining Non-redundant Reclassification Rules
    Tsay, Li-Shiang
    Im, Seunghyun
    [J]. NEXT-GENERATION APPLIED INTELLIGENCE, PROCEEDINGS, 2009, 5579 : 806 - +
  • [5] Non-redundant data clustering
    Gondek, D
    Hofmann, T
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 75 - 82
  • [6] Non-redundant data clustering
    Gondek, David
    Hofmann, Thomas
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 12 (01) : 1 - 24
  • [7] Non-redundant data clustering
    David Gondek
    Thomas Hofmann
    [J]. Knowledge and Information Systems, 2007, 12 : 1 - 24
  • [8] Relevant Subspace Clustering: Mining the Most Interesting Non-Redundant Concepts in High Dimensional Data
    Mueller, Emmanuel
    Assent, Ira
    Guennemann, Stephan
    Krieger, Ralph
    Seidl, Thomas
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 377 - +
  • [9] MINI: Mining informative non-redundant itemsets
    Gallo, Arianna
    De Bie, Tijl
    Cristianini, Nello
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 : 438 - +
  • [10] Mining Non-redundant Periodic Frequent Patterns
    Afriyie, Michael Kofi
    Nofong, Vincent Mwintieru
    Wondoh, John
    Abdel-Fatao, Hamidu
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT I, 2020, 12033 : 321 - 331