Topic distillation and clustering algorithm based on the topology of pages-keywords

被引:0
|
作者
Deng, Jian-Shuang [1 ]
Zheng, Qi-Lun [1 ]
Peng, Hong [1 ]
机构
[1] South China Univ Technol, Dept Comp Sci, Guangzhou 510641, Peoples R China
关键词
hits; topic extracting; community search; topic clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Hits algorithm has gotten great success and been applied in the analysis of web linking. Hits algorithm is used to search the authority pages and the hub pages from the results of the search engine, and it can also be used to search the web communities. But Hits algorithm is based on the hyperlinks of the pages, it is easy to bring the problem of topic excursion. Hits algorithm requires a number of pages as the basic-set for calculating and can not be used in plain texts. This paper introduces a new algorithm: PK-TDC which makes use of the iterative idea of Hits. PK-TDC searches the authority pages and keywords on the topology of pages-keywords, and clusters the pages by their including keywords. The experiment shows PK-TDC algorithm significantly performs in extracting the subjects and clustering not only in the pages with hyperlinks but also in the plain texts.
引用
收藏
页码:1581 / +
页数:2
相关论文
共 50 条
  • [1] Topic detection by clustering keywords
    Wartena, Christian
    Brussee, Rogier
    DEXA 2008: 19TH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2008, : 54 - 58
  • [2] An Algorithm of Topic Distillation Based on Anchor Text
    Jiang Kai-zhong
    Lu Zhao
    Wu Yuan-qiong
    Gu Jun-zhong
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, 2008, : 11 - +
  • [3] A New Generalized Similarity-Based Topic Distillation Algorithm
    ZHOU Hongfang1
    2. Xi’an Branch
    Wuhan University Journal of Natural Sciences, 2007, (05) : 789 - 792
  • [4] The fuzzy clustering algorithm based on AFS topology
    Ding, Rui
    Liu, Xiaodong
    Chen, Yan
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 89 - 98
  • [5] EHM-based web pages fuzzy clustering algorithm
    Yi-Ouyang
    Yun-Ling
    AnDing-Zhu
    MUE: 2007 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND UBIQUITOUS ENGINEERING, PROCEEDINGS, 2007, : 561 - +
  • [6] Topology and Topic-Aware Service Clustering
    Pan, Weifeng
    Dong, Jilei
    Liu, Kun
    Wang, Jing
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2018, 15 (03) : 18 - 37
  • [7] Topic Modelling used to improve Arabic Web pages Clustering
    Alghamdi, Hanan
    Selamat, Ali
    2015 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (ICCC), 2015, : 264 - 269
  • [8] Topic tracking based on keywords dependency profile
    Zheng, Wei
    Zhang, Yu
    Hong, Yu
    Fan, Jili
    Liu, Ting
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 129 - +
  • [9] Topic analysis in LDA based on keywords selection
    Du, Bing-Xin
    Liu, Guo-Ying
    Journal of Computers (Taiwan), 2021, 32 (04) : 1 - 12
  • [10] A novel topic clustering algorithm based on graph neural network for question topic diversity
    Wu, Yongliang
    Wang, Xuejun
    Zhao, Wenbin
    Lv, Xiaofeng
    INFORMATION SCIENCES, 2023, 629 : 685 - 702