Using category-based adherence to cluster market-basket data

被引:0
|
作者
Yun, CH [1 ]
Chuang, KT [1 ]
Chen, MS [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Dept Elect Engn, Taipei 10764, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is defined as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item (i.e., leaf) or a category (i.e., internal) node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm CBA (standing for Category-BasedAdherence), for market-basket data with the objective to minimize the category-based adherence. A validation model based on Information Gain (IG) is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm CBA devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality for market-basket data.
引用
收藏
页码:546 / 553
页数:8
相关论文
共 50 条
  • [1] Adherence clustering: an efficient method for mining market-basket clusters
    Yun, CH
    Chuang, KT
    Chen, MS
    INFORMATION SYSTEMS, 2006, 31 (03) : 170 - 186
  • [2] Revamped Market-Basket Analysis Using In-Memory Computation Framework
    Thanmayee
    Prasad, H. R. Manjunath
    PROCEEDINGS OF 2017 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO 2017), 2017, : 65 - 70
  • [3] Market-Basket Analysis using Agglomerative Hierarchical approach for clustering a retail items
    Saraf, Rujata
    Patil, Sonal
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2016, 16 (03): : 47 - 56
  • [4] Weighted Frequent Multi Partitioned Itemset Mining of Market-Basket Data using MapReduce on YARN Framework
    Bisoyi, Sudhanshu Shekhar
    Mishra, Pragnyaban
    Mishra, S. N.
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON ICT IN BUSINESS INDUSTRY & GOVERNMENT (ICTBIG), 2016,
  • [5] Study of 1,4-dioxane intake in the total diet using the market-basket method
    Nishimura, T
    Iizuka, S
    Kibune, N
    Ando, M
    JOURNAL OF HEALTH SCIENCE, 2004, 50 (01) : 101 - 107
  • [6] Analysis of multi-category purchase incidence decisions using IRI market basket data
    Chib, S
    Seetharaman, PB
    Strijnev, A
    ECONOMETRIC MODELS IN MARKETING, 2002, 16 : 57 - 92
  • [7] Content based image retrieval using category-based indexing
    Wardhani, A
    Thomson, T
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 783 - 786
  • [8] Using category-based collaborative filtering in the Active WebMuseum
    Kohrs, A
    Merialdo, B
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 351 - 354
  • [9] Using category-based semantic field for text categorization
    Wang, QA
    Guan, Y
    Wang, XL
    Xu, ZM
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3781 - 3786
  • [10] Category-based and Target-based Data Augmentation for Dysarthric Speech Recognition Using Transfer Learning
    Nawroly, Sarkhell Sirwan
    Popescu, Decebal
    Antony, Mariya Celin T. H. E. K. E. K. A. R. A.
    STUDIES IN INFORMATICS AND CONTROL, 2024, 33 (04):