A Wikipedia-based Semantic Model for Text Clustering

被引：0

作者：

Zhou, Jing-min ^{[1
]}

Cui, Qing-jun ^{[1
]}

Zhang, Hui ^{[1
]}

机构：

[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China

来源：

2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 2 | 2011年

关键词：

Wikipedia; semantic; Semantic-rank algorithm; semantic attractive force; text clustering;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Taking the advantages of the veracity and comprehensiveness of Wikipedia information, we mined semantic knowledge from Wikipedia abstracts and introduced a Wikipedia-based semantic model for text clustering. In this model, the words or phrases that are closely related in Wikipedia abstracts are gathered to semantic groups, which we define as "semantic clusters" in this paper. The proposed semantic model also contains a Semantic-rank algorithm used to compute the significance of the words or phrases in a semantic cluster. Inspired by the phenomenon that the source charge exerts electric force to the victim charge, we introduced a new concept called "semantic attractive force" between a semantic cluster and a document. We applied the formula of semantic attractive force to the process of text clustering and ultimately complete the semantic text clustering based on Wikipedia. Experimental results demonstrate that compared with the traditional keyword-based text clustering, the newly developed semantic model enhances the clustering quality of both clustering and cluster labels.

引用

页码：413 / 416

页数：4

共 7 条

[1] [Anonymous], MODERN INFORM
[2] liu Yufang, 2010, P INT C COMP CONTR I
[3] Page L., 1998, PAGERANKCITATION RAN
[4] Rocchio J. J., 1966, THESIS
[5] Shehata Shady, 2009, P IEEE INT C DAT MIN, P46
[6] IMPLEMENTING AGGLOMERATIVE HIERARCHICAL-CLUSTERING ALGORITHMS FOR USE IN DOCUMENT-RETRIEVAL
VOORHEES, EM
[J]. INFORMATION PROCESSING & MANAGEMENT, 1986, 22 (06) : 465 - 476
[7] Zamir O., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P46, DOI 10.1145/290941.290956

← 1 →