Enhancing Text Clustering Performance Using Semantic Similarity

被引:0
|
作者
Gad, Walaa K. [1 ]
Kamel, Mohamed S. [1 ]
机构
[1] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
来源
关键词
Semantic similarity measures; Adapted Lesk algorithm; Word sense disambiguation; WordNet;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text documents clustering can be challenging due to complex linguistics properties of the text documents. Most of clustering techniques are based oil traditional bag of words to represent the documents. In Such document representation, ambiguity, synonymy and semantic similarities may not be captured using traditional text mining techniques that are based on words and/or phrases frequencies in the text. In this paper, we propose a semantic similarity based model to capture the semantic of the text. The proposed model in conjunction with lexical ontology solves the synonyms and hypernyms problems. It utilizes WordNet as an ontology and uses the adapted Lesk algorithm to examine and extract the relationships between terms. The proposed model reflects the relationships by the semantic weighs added to the term frequency weight to represent the semantic similarity between terms. Experiments using the proposed semantic similarity based model in text clustering are conducted. The obtained results show promising performance improvements compared to the traditional vector space model as well as other existing methods that include semantic similarity measures in text clustering.
引用
收藏
页码:325 / 335
页数:11
相关论文
共 50 条
  • [1] Enhancing semantic text similarity with functional semantic knowledge (FOP) in patents
    Teng, Hao
    Wang, Nan
    Zhao, Hongyu
    Hu, Yingtong
    Jin, Haitao
    [J]. JOURNAL OF INFORMETRICS, 2024, 18 (01)
  • [2] New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps
    Gad, Walaa K.
    Kamel, Mohamed S.
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 663 - 677
  • [3] Research on text similarity algorithm based on sentence semantic clustering
    [J]. Zhang, J. (zhangjinpengyy1989@163.com), 1600, Binary Information Press (10):
  • [4] Improved Semantic Similarity Method Based on HowNet for Text Clustering
    Nie, Hongmei
    Zhou, Jiaqing
    Guo, Qi
    Huang, Zhiqi
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 266 - 269
  • [5] Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity
    Zhu, Shanfeng
    Zeng, Jia
    Mamitsuka, Hiroshi
    [J]. BIOINFORMATICS, 2009, 25 (15) : 1944 - 1951
  • [6] A WordNet-based Semantic Model for Enhancing Text Clustering
    Shehata, Shady
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 477 - 482
  • [7] Assessing text semantic similarity using ontology
    Liu, Hongzhe
    Wang, Pengfei
    [J]. 1600, Academy Publisher (09): : 490 - 497
  • [8] Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures
    Song, Wei
    Li, Cheng Hua
    Park, Soon Cheol
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (05) : 9095 - 9104
  • [9] Semantic Document Clustering Using a Similarity Graph
    Stanchev, Lubomir
    [J]. 2016 IEEE TENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2016, : 1 - 8
  • [10] Text Clustering Using Statistical and Semantic Data
    Benghabrit, Asmaa
    Ouhbi, Brahim
    Behja, Hicham
    Frikh, Bouchra
    [J]. WORLD CONGRESS ON COMPUTER & INFORMATION TECHNOLOGY (WCCIT 2013), 2013,