Using category-based adherence to cluster market-basket data

被引:0
|
作者
Yun, CH [1 ]
Chuang, KT [1 ]
Chen, MS [1 ]
机构
[1] Natl Taiwan Univ, Grad Inst Commun Engn, Dept Elect Engn, Taipei 10764, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we devise an efficient algorithm for clustering market-basket data. Different from those of the traditional data, the features of market-basket data are known to be of high dimensionality, sparsity, and with massive outliers. Without explicitly considering the presence of the taxonomy, most prior efforts on clustering market-basket data can be viewed as dealing with items in the leaf level of the taxonomy tree. Clustering transactions across different levels of the taxonomy is of great importance for marketing strategies as well as for the result representation of the clustering techniques for market-basket data. In view of the features of market-basket data, we devise in this paper a novel measurement, called the category-based adherence, and utilize this measurement to perform the clustering. The distance of an item to a given cluster is defined as the number of links between this item and its nearest large node in the taxonomy tree where a large node is an item (i.e., leaf) or a category (i.e., internal) node whose occurrence count exceeds a given threshold. The category-based adherence of a transaction to a cluster is then defined as the average distance of the items in this transaction to that cluster With this category-based adherence measurement, we develop an efficient clustering algorithm, called algorithm CBA (standing for Category-BasedAdherence), for market-basket data with the objective to minimize the category-based adherence. A validation model based on Information Gain (IG) is also devised to assess the quality of clustering for market-basket data. As validated by both real and synthetic datasets, it is shown by our experimental results, with the taxonomy information, algorithm CBA devised in this paper significantly outperforms the prior works in both the execution efficiency and the clustering quality for market-basket data.
引用
收藏
页码:546 / 553
页数:8
相关论文
共 50 条
  • [21] Using the MGGI Methodology for Category-based Language Modeling in Handwritten Marriage Licenses Books
    Romero, Veronica
    Fornes, Alicia
    Vidal, Enrique
    Sanchez, Joan Andreu
    PROCEEDINGS OF 2016 15TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2016, : 331 - 336
  • [22] Semantic Category-Based Classification Using Nonlinear Features and Wavelet Coefficients of Brain Signals
    Ali Torabi
    Fatemeh Zareayan Jahromy
    Mohammad Reza Daliri
    Cognitive Computation, 2017, 9 : 702 - 711
  • [23] Category-Based Deep CCA for Fine-Grained Venue Discovery From Multimodal Data
    Yu, Yi
    Tang, Suhua
    Aizawa, Kiyoharu
    Aizawa, Akiko
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (04) : 1250 - 1258
  • [24] Mining market basket data using share measures and characterized itemsets
    Hilderman, RJ
    Carter, CL
    Hamilton, HJ
    Cercone, N
    RESEARCH AND DEVELOPMENT IN KNOWLEDGE DISCOVERY AND DATA MINING, 1998, 1394 : 159 - 173
  • [25] Mutual information based clustering of market basket data for profiling users
    Ende, Bartholomaeus
    Brause, Ruediger
    19TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, VOL I, PROCEEDINGS, 2007, : 374 - +
  • [26] Market Basket Analysis: Identify the changing trends of market data using association rule mining
    Kaur, Manpreet
    Kang, Shivani
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELLING AND SECURITY (CMS 2016), 2016, 85 : 78 - 85
  • [27] Contents and daily intakes of gamma-ray emitting nuclides, 90Sr, and 238U using market-basket studies in Japan
    Sugiyama, Hideo
    Terada, Hiroshi
    Takahashi, Mitsuko
    Iijima, Ikuyo
    Isomura, Kimio
    JOURNAL OF HEALTH SCIENCE, 2007, 53 (01) : 107 - 118
  • [28] Category-based fractal modelling: A novel model to integrate the geology into the data for more effective processing and interpretation
    Sadeghi, Behnam
    Cohen, David R.
    JOURNAL OF GEOCHEMICAL EXPLORATION, 2021, 226
  • [29] Category-based audience metrics for web site content improvement using ontologies and page classification
    Norguet, Jean-Pierre
    Tshibasu-Kabeya, Benjamin
    Bontempi, Gianluca
    Zimanyi, Esteban
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PROCEEDINGS, 2006, 3999 : 216 - 220
  • [30] CATEGORY-BASED AND FEATURE-BASED PROCESSES IN PERFORMANCE-APPRAISAL - INTEGRATING VISUAL AND COMPUTERIZED SOURCES OF PERFORMANCE DATA
    KULIK, CT
    AMBROSE, ML
    JOURNAL OF APPLIED PSYCHOLOGY, 1993, 78 (05) : 821 - 830