A dynamic hot topic extraction model based on time window

被引:0
|
作者
Ma H. [1 ,2 ]
Yin M. [3 ]
He Q. [1 ]
Shi Z. [1 ]
机构
[1] The Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences
[2] Graduate University of Chinese Academy of Sciences
[3] Internet Service Lab, Lenovo Corporate Research
来源
关键词
Burst; Pervasiveness; TF-PDF; Time window; Topic extraction;
D O I
10.3772/j.issn.1002-0470.2010.06.007
中图分类号
学科分类号
摘要
This paper gives a description of a study of topic organization in the news domain, and presents a novel dynamic hot topic extraction model based on the time window. The model combines two characteristics of hot topics together. One is the pervasiveness of topic terms in news texts, which is evaluated by the occurrences of the topic terms reported by different channels, and the more frequent the occurrence of the topic terms reported, the higher the pervasiveness of topic terms. The other one is the burst of topic terms in the news stream, which can be assessed by the abnormal occurrence frequencies of topic terms in a specific interval compared with other different time intervals. The time window is introduced to make burst detection and the term frequency-proportional document frequency (TF-PDF) is combined to weigh the terms. The experimental results demonstrate that this model is effective in topic extraction for news texts.
引用
收藏
页码:590 / 595
页数:5
相关论文
共 8 条
  • [1] Luhn H.P., Advances in Automatic Text Summarization, pp. 15-22, (1956)
  • [2] The 2004 topicdetection and tracking task definition and evaluation plan, (2004)
  • [3] Bun K.K., Ishizuka M., Topic extraction from news archive using TF-PDF algorithm, Proceeding of the 3rd International Conference on Web Information Systems Engineering, pp. 73-82, (2002)
  • [4] Saltan G., Buckley C., Term-weighting approaches in automatic text retrieval, Information Processing and Management, 4, 5, pp. 513-523, (1989)
  • [5] Chen K.Y., Luesukprasert L., Chou S.T., Hot topic extraction based on timeline analysis and multidimensional sentence modeling, IEEE Trans TKDE, 19, 8, pp. 1016-1026, (2007)
  • [6] 41, 10, pp. 1754-1759, (2004)
  • [7] Porter M., An algorithm for suffix stripping, Program, 14, 3, pp. 211-218, (1980)
  • [8] Ma H.F., Zhao W.Z., Shi Z.Z., An approach of multi-document summarization based on text relationship map, Proceeding of the International Conference on Advanced Intelligence, pp. 1536-1542, (2008)