Sequence-based clustering for Web usage mining: A new experimental framework and ANN-enhanced K-means algorithm

被引:35
|
作者
Park, Sungjune [1 ]
Suresh, Nallan C. [2 ]
Jeong, Bong-Keun [1 ]
机构
[1] Univ N Carolina, Business Informat Syst & Operat Management, Belk Coll Business, Charlotte, NC 28223 USA
[2] SUNY Buffalo, Dept Operat Management & Strategy, Sch Management, Buffalo, NY 14260 USA
关键词
Web usage mining; clustering methods; simulation; artificial intelligence; Markov chain;
D O I
10.1016/j.datak.2008.01.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We develop a general sequence-based clustering method by proposing new sequence representation schemes in association with Markov models. The resulting sequence representations allow for calculation of vector-based distances (dissimilarities) between Web user sessions and thus can be used as inputs of various clustering algorithms. We develop an evaluation framework in which the performances of the algorithms are compared in terms of whether the clusters (groups of Web users who follow the same Markov process) are correctly identified using a replicated clustering approach. A series of experiments is conducted to investigate whether clustering performance is affected by different sequence representations and different distance measures as well as by other factors such as number of actual Web user clusters, number of Web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form. A new, fuzzy ART-enhanced K-means algorithm is also developed And its superior performance is demonstrated. Published by Elsevier B.V.
引用
收藏
页码:512 / 543
页数:32
相关论文
共 50 条
  • [1] A new k-means based clustering algorithm in aspect mining
    Serban, Gabriela
    Moldovan, Grigoreta Sofia
    [J]. SYNASC 2006: EIGHTH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, PROCEEDINGS, 2007, : 69 - +
  • [2] A Modified K-means Algorithm for Sequence Clustering
    Hsu, Jia-Lien
    Yang, Hong-Xiang
    [J]. HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 287 - 292
  • [3] Application of Density-based Adaptive K-Means Clustering Algorithm in Web Log Mining
    Guo, Guang Nan
    Yun, Yong Gang
    Chu Mei
    Shi, Hong Yan
    Yin, Ke Gong
    [J]. MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 5152 - 5156
  • [4] Efficient enhanced k-means clustering algorithm
    Fahim A.M.
    Salem A.M.
    Torkey F.A.
    Ramadan M.A.
    [J]. Journal of Zhejiang University-SCIENCE A, 2006, 7 (10): : 1626 - 1633
  • [5] A K-means Clustering Algorithm Based on Enhanced Differential Evolution
    Mao, Li
    Gong, Huaijin
    Liu, Xingyang
    [J]. ADVANCED MANUFACTURING SYSTEMS, 2011, 339 : 71 - 75
  • [6] An efficient enhanced k-means clustering algorithm
    FAHIM A.M
    SALEM A.M
    TORKEY F.A
    RAMADAN M.A
    [J]. Journal of Zhejiang University-Science A(Applied Physics & Engineering), 2006, (10) : 1626 - 1633
  • [7] A k-means based clustering algorithm
    Bloisi, Domenico Daniele
    Locchi, Luca
    [J]. COMPUTER VISION SYSTEMS, PROCEEDINGS, 2008, 5008 : 109 - 118
  • [8] Parallelization of K-Means Clustering Algorithm for Data Mining
    Jiang, Hao
    Yu, Liyan
    [J]. 4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
  • [9] k*-means:: A new generalized k-means clustering algorithm
    Cheung, YM
    [J]. PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893
  • [10] An Improved K-means Algorithm for DNA Sequence Clustering
    Aleb, Nassima
    Labidi, Narimane
    [J]. 2015 26TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2015, : 39 - 42