Chinese Hot Topic Extraction Based on Web Log

被引:1
|
作者
Li, Junhua [1 ]
Liu, Zhen [1 ]
Fu, Yan [1 ]
She, Li [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Peoples R China
关键词
Chinese Hot topic extraction; theme extraction; web log;
D O I
10.1109/WISM.2009.29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional topic extraction methods only take text document into account and ignore user's contribution in the process of extraction. But it occurs to us that the browsing status of users in one topic plays a more important role in indicating whether this topic is currently hot than the properties of text document. So in this paper, we bring forward a method of extracting "Chinese hot topic" from a set of text document downloaded from the Internet according to the web log. There are three major steps. Firstly, we get all corrective user information and the textual materials from web according to the web log. Secondly, we extract the hot terms of each web page, computing hotness of theme based on click-through rate and the forgetting factor. Finally, we form hot topics by merging correlative themes on the basis of common hot terms. It can deal with massive textual data with high efficiency and brings a new angle from the users in determining whether a topic is hot or not. We test our method on some data from several portal sites, and find that it detects the topics with highest hotness efficiently.
引用
收藏
页码:103 / 107
页数:5
相关论文
共 50 条
  • [31] Research and Implementation of a Method for Web Log Analysis Template Extraction
    Ni, Zengchao
    Liu, Hongqi
    Chen, Yuanping
    Wu, Dengsheng
    7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2019): INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT BASED ON ARTIFICIAL INTELLIGENCE, 2019, 162 : 673 - 681
  • [32] Efficient Extraction for Mobile Web Access Log with Caching Strategy
    Gao, Lifeng
    Zhu, Min
    Li, Mengying
    Cao, Yu
    Zhang, Weixue
    JOURNAL OF COMPUTERS, 2016, 11 (02) : 99 - 108
  • [33] Design and implementation of Web hot-topic talk mining based on scale-free network
    Qin, Sen
    Dai, Guan-Zhong
    Li, Yan-Ling
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1184 - +
  • [34] Mining Web Log Data for News Topic Modeling Using Latent Dirichlet Allocation
    Surjandari, Isti
    Rosyidah, Asma
    Zulkarnain
    Laoh, Enrico
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 331 - 335
  • [35] Internet Users' Psychosocial Attention Prediction: Web Hot Topic Prediction Based on Adaptive AR Model
    Tong, Hengqing
    Liu, Yang
    Peng, Hui
    Tang, Jing
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 458 - 462
  • [36] The Web Versus Digital Libraries: Time to Revisit This Once Hot Topic
    Casarosa, Vittore
    Cousins, Jill
    Tammaro, Anna Maria
    Ioannidis, Yannis
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2008, 5173 : 383 - +
  • [37] Modeling Chinese microblogs with five Ws for topic hashtags extraction
    Zhao Z.
    Sun J.
    Yao L.
    Wang X.
    Chu J.
    Liu H.
    Yu G.
    Yao, Lan (yaolan@cse.neu.edu.cn), 1600, Tsinghua University (22): : 135 - 148
  • [38] Modeling Chinese Microblogs with Five Ws for Topic Hashtags Extraction
    Zhibin Zhao
    Jiahong Sun
    Lan Yao
    Xun Wang
    Jiahong Chu
    Huan Liu
    Ge Yu
    Tsinghua Science and Technology, 2017, (02) : 135 - 148
  • [39] Modeling Chinese Microblogs with Five Ws for Topic Hashtags Extraction
    Zhao, Zhibin
    Sun, Jiahong
    Yao, Lan
    Wang, Xun
    Chu, Jiahong
    Liu, Huan
    Yu, Ge
    TSINGHUA SCIENCE AND TECHNOLOGY, 2017, 22 (02) : 135 - 148
  • [40] Modeling Chinese Microblogs with Five Ws for Topic Hashtags Extraction
    Zhibin Zhao
    Jiahong Sun
    Lan Yao
    Xun Wang
    Jiahong Chu
    Huan Liu
    Ge Yu
    Tsinghua Science and Technology, 2017, 22 (02) : 135 - 148