A Wikipedia-based Semantic Model for Text Clustering

被引:0
|
作者
Zhou, Jing-min [1 ]
Cui, Qing-jun [1 ]
Zhang, Hui [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
关键词
Wikipedia; semantic; Semantic-rank algorithm; semantic attractive force; text clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Taking the advantages of the veracity and comprehensiveness of Wikipedia information, we mined semantic knowledge from Wikipedia abstracts and introduced a Wikipedia-based semantic model for text clustering. In this model, the words or phrases that are closely related in Wikipedia abstracts are gathered to semantic groups, which we define as "semantic clusters" in this paper. The proposed semantic model also contains a Semantic-rank algorithm used to compute the significance of the words or phrases in a semantic cluster. Inspired by the phenomenon that the source charge exerts electric force to the victim charge, we introduced a new concept called "semantic attractive force" between a semantic cluster and a document. We applied the formula of semantic attractive force to the process of text clustering and ultimately complete the semantic text clustering based on Wikipedia. Experimental results demonstrate that compared with the traditional keyword-based text clustering, the newly developed semantic model enhances the clustering quality of both clustering and cluster labels.
引用
收藏
页码:413 / 416
页数:4
相关论文
共 7 条