Chinese Hot Topic Extraction Based on Web Log

被引:1
|
作者
Li, Junhua [1 ]
Liu, Zhen [1 ]
Fu, Yan [1 ]
She, Li [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Peoples R China
关键词
Chinese Hot topic extraction; theme extraction; web log;
D O I
10.1109/WISM.2009.29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional topic extraction methods only take text document into account and ignore user's contribution in the process of extraction. But it occurs to us that the browsing status of users in one topic plays a more important role in indicating whether this topic is currently hot than the properties of text document. So in this paper, we bring forward a method of extracting "Chinese hot topic" from a set of text document downloaded from the Internet according to the web log. There are three major steps. Firstly, we get all corrective user information and the textual materials from web according to the web log. Secondly, we extract the hot terms of each web page, computing hotness of theme based on click-through rate and the forgetting factor. Finally, we form hot topics by merging correlative themes on the basis of common hot terms. It can deal with massive textual data with high efficiency and brings a new angle from the users in determining whether a topic is hot or not. We test our method on some data from several portal sites, and find that it detects the topics with highest hotness efficiently.
引用
收藏
页码:103 / 107
页数:5
相关论文
共 50 条
  • [1] Hot Topic Extraction Based on Chinese Microblog's Features Topic Model
    Ye, Yongtao
    Du, Yajun
    Fu, Xia
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 348 - 353
  • [2] Web Log Mining based on Website Topic
    Yu, Xiaobing
    Guo, Shunsheng
    Peng, Zhao
    SEVENTH WUHAN INTERNATIONAL CONFERENCE ON E-BUSINESS, VOLS I-III: UNLOCKING THE FULL POTENTIAL OF GLOBAL TECHNOLOGY, 2008, : 874 - 878
  • [3] Web topic text extraction based on document features
    Lin, Kunhui
    Xiao, Zhimin
    Wu, Tunhua
    Zhou, Changle
    Yao, Junfeng
    Journal of Computational Information Systems, 2007, 3 (03): : 1181 - 1188
  • [4] Research and Implementation of Hot Topic Detection System Based on Web
    Zhu, Bing
    Yu, Yang
    Li, Chuanzhen
    Wang, Hui
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1504 - 1509
  • [5] A dynamic hot topic extraction model based on time window
    Ma H.
    Yin M.
    He Q.
    Shi Z.
    Gaojishu Tongxin/Chinese High Technology Letters, 2010, 20 (06): : 590 - 595
  • [6] Web Hot Topic Picture Crawler
    Wang, De-zhi
    Liang, Jun-yan
    2019 IEEE 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2019), 2019, : 388 - 392
  • [7] A Modified System for Web log Topic Relevance Retrieval
    Li, Si
    Du, Lei
    Xu, Weiran
    Guo, Jun
    2009 SECOND INTERNATIONAL CONFERENCE ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, FITME 2009, 2009, : 392 - 395
  • [8] A Novel Chinese Text Topic Extraction Method Based on LDA
    Liu, Qihua
    PROCEEDINGS OF 2015 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2015), 2015, : 53 - 57
  • [9] Hot topic extraction based on timeline analysis and multidimensional sentence modeling
    Chen, Kuan-Yu
    Luesukprasert, Luesak
    Chou, Seng-Cho T.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (08) : 1016 - 1025
  • [10] Study of Web Page Information Topic Extraction Technology Based on Vision
    Li, Qingshui
    Wu, Kai
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 9 (ICCSIT 2010), 2010, : 781 - 784