Research and improvement on topic distillation algorithm in web mining

被引:0
|
作者
Wang, Bao-Yi [1 ]
Ding, Juan [1 ]
机构
[1] North China Elect Power Univ, Sch Comp Sci & Technol, Baoding 071003, Peoples R China
关键词
hypertext induced topic search; topic distillation; topic drift; link analyses;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, the Hypertext Induced Topic Search algorithm which is the topic search algorithm based on hyperlinks is studied. Aiming at the problem of topic drift existing in this algorithm, the thoughts of the modified algorithms are studied. The methods of textual analysis such as weight conferral of links based on the relative values of ambient texts increase the complexity inevitably. And the methods of utilizing link-structural information such as root-set eigenvector projection method and similarity-based algorithm for topic distillation have large computation spending. So the possible reason for topic drift is basically analyzed, and then an improved algorithm using hyperlinks which is named as base-set shrinking method is presented. This algorithm not only avoids the problem of topic drift, but also consumedly reduces the computation spending, which is demonstrated in experiment and theory.
引用
收藏
页码:1561 / +
页数:2
相关论文
共 50 条
  • [1] Research on Hotspot Topic Discovery Algorithm Based on Web Mining Technology
    Li, Yeqing
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 1, 2017, : 813 - 816
  • [2] Research on Web Data Mining Based on Topic Crawler
    Guo, Hongjian
    JOURNAL OF WEB ENGINEERING, 2021, 20 (04): : 1131 - 1143
  • [3] s-HITSc: an improved model and algorithm for topic distillation on the Web
    Xu, ZM
    Cao, X
    Dong, YS
    Han, YH
    SOFT COMPUTING, 2006, 10 (01) : 2 - 11
  • [4] s-HITSc: an improved model and algorithm for topic distillation on the Web
    Zhuoming XU
    Xiao CAO
    Yisheng DONG
    Yahong HAN
    Soft Computing, 2006, 10 : 185 - 186
  • [5] s-HITSc: an improved model and algorithm for topic distillation on the Web
    Zhuoming XU
    Xiao CAO
    Yisheng DONG
    Yahong HAN
    Soft Computing, 2006, 10 : 2 - 11
  • [6] Improvement of LDA Topic Mining Algorithm and Its Application in Short Text
    Li, Kai
    Li, Chunmei
    2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [7] Research on Technology, Algorithm and Application of Web Mining
    Li, Yeqing
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 1, 2017, : 772 - 775
  • [8] Research and realization of text mining algorithm on web
    Yin, Shiqun
    Qiu, Yuhui
    Ge, Jike
    CIS WORKSHOPS 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY WORKSHOPS, 2007, : 413 - 416
  • [9] Web Service Orchestration Topic Mining
    Chu, Victor W.
    Wong, Raymond K.
    Chi, Chi-Hung
    Hung, Patrick C. K.
    2014 IEEE 21ST INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2014), 2014, : 225 - 232
  • [10] STRUCTURAL TOPIC MINING IN WEB COLLECTIONS
    Garza, Sara E.
    Brena, Ramon F.
    APPLIED AND COMPUTATIONAL MATHEMATICS, 2012, 11 (02) : 271 - 285