Clustering algorithms and latent semantic indexing to identify similar pages in web applications

被引:0
|
作者
De Lucia, Andrea [1 ]
Risi, Michele [1 ]
Tortora, Genoveffa [1 ]
Scanniello, Giuseppe [1 ]
机构
[1] Univ Salerno, Dipartimento Matemat & Informat, Via Ponte Don Melillo, I-84084 Fisciano, SA, Italy
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, we analyze some clustering algorithms that have been widely employed in the past to support the comprehension of web applications. To this end, we have defined an approach to identify static pages that are duplicated or cloned at the content level. This approach is based on a process that first computes the dissimilarity between web pages using Latent Semantic Indexing, a well known information retrieval technique, and then groups similar-pages using clustering algorithms. We consider five instances of this process, each based on three variants of the agglomerative hierarchical clustering algorithm, a divisive clustering algorithm, k-means partitional clustering algorithm, and a widely employed partitional competitive clustering algorithm, namely Winner Takes All. In order to assess the proposed approach, we have used the static pages of three web, applications and one static web site.
引用
收藏
页码:65 / +
页数:3
相关论文
共 43 条
  • [1] Comparing clustering algorithms for the identification of similar pages in web applications
    De Lucia, Andrea
    Risi, Michele
    Scanniello, Giuseppe
    Tortora, Genoveffa
    [J]. WEB ENGINEERING, PROCEEDINGS, 2007, 4607 : 415 - +
  • [2] AN INVESTIGATION OF CLUSTERING ALGORITHMS IN THE IDENTIFICATION OF SIMILAR WEB PAGES
    De Lucia, Andrea
    Risi, Michele
    Scanniello, Giuseppe
    Tortora, Genoveffa
    [J]. JOURNAL OF WEB ENGINEERING, 2009, 8 (04): : 346 - 370
  • [3] Analysis of web clustering based on genetic algorithm with latent semantic indexing technology
    Song, Wei
    Park, Soon Cheol
    [J]. ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 21 - +
  • [4] Identifying similar pages in Web applications using a competitive clustering algorithm
    De Lucia, Andrea
    Scanniello, Giuseppe
    Tortora, Genoveffa
    [J]. JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2007, 19 (05): : 281 - 296
  • [5] FAST UPDATING ALGORITHMS FOR LATENT SEMANTIC INDEXING
    Vecharynski, Eugene
    Saad, Yousef
    [J]. SIAM JOURNAL ON MATRIX ANALYSIS AND APPLICATIONS, 2014, 35 (03) : 1105 - 1131
  • [6] Latent semantic indexing for web service retrieval
    [J]. Czyszczoń, Adam (adam.czyszczon@pwr.edu.pl), 1600, Springer Verlag (8733):
  • [7] Latent Semantic Indexing for Web Service Retrieval
    Czyszczon, Adam
    Zgrzywa, Aleksander
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, ICCCI 2014, 2014, 8733 : 694 - 702
  • [8] Gene clustering by Latent Semantic Indexing of MEDLINE abstracts
    Homayouni, R
    Heinrich, K
    Wei, L
    Berry, MW
    [J]. BIOINFORMATICS, 2005, 21 (01) : 104 - 115
  • [9] Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages
    Urae, Hiroshi
    Tezuka, Taro
    Kimura, Fuminori
    Maeda, Akira
    [J]. INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 662 - 667
  • [10] An RDF-based framework for Semantic Indexing of web pages
    Amato, F.
    Moscato, V.
    Persia, F.
    Picariello, A.
    Gargiulo, F.
    [J]. 2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 395 - +