A semantic approach for text clustering using WordNet and lexical chains

被引:153
|
作者
Wei, Tingting [1 ]
Lu, Yonghe [3 ]
Chang, Huiyou [2 ]
Zhou, Qiang [1 ]
Bao, Xianyu [4 ]
机构
[1] Sun Yat Sen Univ, Dept Informat Sci & Technol, Guangzhou 510275, Guangdong, Peoples R China
[2] Sun Yat Sen Univ, Dept Software, Guangzhou 510275, Guangdong, Peoples R China
[3] Sun Yat Sen Univ, Dept Informat Management, Guangzhou 510275, Guangdong, Peoples R China
[4] Shenzhen Acad Inspect & Quarantine, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Text clustering; Word Net; Lexical chains; Core semantic features; CONTEXT;
D O I
10.1016/j.eswa.2014.10.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional clustering algorithms do not consider the semantic relationships among words so that cannot accurately represent the meaning of documents. To overcome this problem, introducing semantic information from ontology such as WordNet has been widely used to improve the quality of text clustering. However, there still exist several challenges, such as synonym and polysemy, high dimensionality, extracting core semantics from texts, and assigning appropriate description for the generated clusters. In this paper, we report our attempt towards integrating WordNet with lexical chains to alleviate these problems. The proposed approach exploits ontology hierarchical structure and relations to provide a more accurate assessment of the similarity between terms for word sense disambiguation. Furthermore, we introduce lexical chains to extract a set of semantically related words from texts, which can represent the semantic content of the texts. Although lexical chains have been extensively used in text summarization, their potential impact on text clustering problem has not been fully investigated. Our integrated way can identify the theme of documents based on the disambiguated core features extracted, and in parallel downsize the dimensions of feature space. The experimental results using the proposed framework on reuters-21578 show that clustering performance improves significantly compared to several classical methods. (C) 2014 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
引用
收藏
页码:2264 / 2275
页数:12
相关论文
共 50 条
  • [1] Web Document Clustering Approach using WordNet Lexical Categories and Fuzzy Clustering
    Gharib, Tarek F.
    Fouad, Mohammed M.
    Aref, Mostafa M.
    [J]. 2008 11TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY: ICCIT 2008, VOLS 1 AND 2, 2008, : 55 - +
  • [2] WordNet-based lexical semantic classification for text corpus analysis
    Long Jun
    Wang Lu-da
    Li Zu-de
    Zhang Zu-ping
    Yang Liu
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2015, 22 (05) : 1833 - 1840
  • [3] WordNet-based lexical semantic classification for text corpus analysis
    Jun Long
    Lu-da Wang
    Zu-de Li
    Zu-ping Zhang
    Liu Yang
    [J]. Journal of Central South University, 2015, 22 : 1833 - 1840
  • [4] A WordNet-based Semantic Model for Enhancing Text Clustering
    Shehata, Shady
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 477 - 482
  • [5] Evaluation of Text Clustering Methods Using WordNet
    Amine, Abdelmalek
    Elberrichi, Zakaria
    Simonet, Michel
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2010, 7 (04) : 349 - 357
  • [6] WordNet and Semantic Similarity based Approach for Document Clustering
    Desai, Sneha S.
    Laxminarayana, J. A.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 312 - 317
  • [7] Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus
    Atoum, Issa
    Otoom, Ahmed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) : 124 - 130
  • [8] An Approach to Automatic Text Summarization using WordNet
    Pal, Alok Ranjan
    Saha, Diganta
    [J]. SOUVENIR OF THE 2014 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2014, : 1169 - 1173
  • [9] Semantic Document Clustering Using Information from WordNet and DBPedia
    Stanchev, Lubomir
    [J]. 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 100 - 107
  • [10] A new unsupervised method for document clustering by using WordNet lexical and conceptual relations
    Recupero, Diego Reforgiato
    [J]. INFORMATION RETRIEVAL, 2007, 10 (06): : 563 - 579