New algorithms for finding approximate frequent item sets

被引:7
|
作者
Borgelt, Christian [1 ]
Braune, Christian [1 ,2 ]
Koetter, Tobias [3 ]
Gruen, Sonja [4 ,5 ]
机构
[1] European Ctr Soft Comp, Mieres 33600, Asturias, Spain
[2] Otto Von Guericke Univ, Dept Comp Sci, D-39106 Magdeburg, Germany
[3] Univ Konstanz, Dept Comp Sci, D-78457 Constance, Germany
[4] RIKEN, Brain Sci Inst, Wako, Saitama 3510198, Japan
[5] Res Ctr Julich, Inst Neurosci & Med INM 6, Julich, Germany
关键词
ASSOCIATION; NOISE;
D O I
10.1007/s00500-011-0776-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In standard frequent item set mining a transaction supports an item set only if all items in the set are present. However, in many cases this is too strict a requirement that can render it impossible to find certain relevant groups of items. By relaxing the support definition, allowing for some items of a given set to be missing from a transaction, this drawback can be amended. The resulting item sets have been called approximate, fault-tolerant or fuzzy item sets. In this paper we present two new algorithms to find such item sets: the first is an extension of item set mining based on cover similarities and computes and evaluates the subset size occurrence distribution with a scheme that is related to the Eclat algorithm. The second employs a clustering-like approach, in which the distances are derived from the item covers with distance measures for sets or binary vectors and which is initialized with a one-dimensional Sammon projection of the distance matrix. We demonstrate the benefits of our algorithms by applying them to a concept detection task on the 2008/2009 Wikipedia Selection for schools and to the neurobiological task of detecting neuron ensembles in (simulated) parallel spike trains.
引用
收藏
页码:903 / 917
页数:15
相关论文
共 50 条
  • [41] Finding frequent items in sliding windows with multinomially-distributed item frequencies
    Golab, L
    DeHaan, D
    López-Ortiz, A
    Demaine, ED
    16TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2004, : 425 - 426
  • [42] Efficient Algorithms for Association Finding and Frequent Association Pattern Mining
    Cheng, Gong
    Liu, Daxin
    Qu, Yuzhong
    SEMANTIC WEB - ISWC 2016, PT I, 2016, 9981 : 119 - 134
  • [43] Algorithms for finding maximal-scoring segment sets
    Csurös, M
    ALGORITHMS IN BIOINFORMATICS, PROCEEDINGS, 2004, 3240 : 62 - 73
  • [44] A Compact Data Structure Based Technique for Mining Frequent Closed Item Sets
    Ahuja, Kamlesh
    Mishra, Durgesh Kumar
    Jain, Sarika
    SMART TRENDS IN INFORMATION TECHNOLOGY AND COMPUTER COMMUNICATIONS, SMARTCOM 2016, 2016, 628 : 503 - 508
  • [45] Application of Hybrid Ant Colony Algorithm for Mining Maximum Frequent Item Sets
    Gao Ye
    Tang Xiao-lan
    2015 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), 2015, : 781 - 784
  • [46] Energy efficient in-sensor data cleaning for mining frequent item sets
    Bahi, Jacques M.
    Makhoul, Abdallah
    Medlej, Maguy
    Sensors and Transducers, 2012, 14 (SPEC. 2): : 64 - 78
  • [47] Fast sequential and parallel algorithms for finding extremal sets
    Shen, H
    Evans, DJ
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1996, 61 (3-4) : 195 - 211
  • [48] A new tool for finding approximate symmetry
    Brock, Carolyn Pratt
    ACTA CRYSTALLOGRAPHICA SECTION C-STRUCTURAL CHEMISTRY, 2019, 75 (07): : 835 - 836
  • [49] Protocol Keywords Extraction Method Based on Frequent Item-Sets Mining
    Li, Gaochao
    Qian, Qiang
    Wang, Zhonghua
    Zou, Xin
    Chen, Xunxun
    Wu, Xiao
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND SYSTEM (ICISS 2018), 2018, : 53 - 58
  • [50] Distilling Architectural Design Decisions and their Relationships using Frequent Item-Sets
    Sobernig, Stefan
    Zdun, Uwe
    2016 13TH WORKING IEEE/IFIP CONFERENCE ON SOFTWARE ARCHITECTURE (WICSA), 2016, : 61 - 70