Enhancing Text Clustering Performance Using Semantic Similarity

被引:0
|
作者
Gad, Walaa K. [1 ]
Kamel, Mohamed S. [1 ]
机构
[1] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
来源
关键词
Semantic similarity measures; Adapted Lesk algorithm; Word sense disambiguation; WordNet;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text documents clustering can be challenging due to complex linguistics properties of the text documents. Most of clustering techniques are based oil traditional bag of words to represent the documents. In Such document representation, ambiguity, synonymy and semantic similarities may not be captured using traditional text mining techniques that are based on words and/or phrases frequencies in the text. In this paper, we propose a semantic similarity based model to capture the semantic of the text. The proposed model in conjunction with lexical ontology solves the synonyms and hypernyms problems. It utilizes WordNet as an ontology and uses the adapted Lesk algorithm to examine and extract the relationships between terms. The proposed model reflects the relationships by the semantic weighs added to the term frequency weight to represent the semantic similarity between terms. Experiments using the proposed semantic similarity based model in text clustering are conducted. The obtained results show promising performance improvements compared to the traditional vector space model as well as other existing methods that include semantic similarity measures in text clustering.
引用
收藏
页码:325 / 335
页数:11
相关论文
共 50 条
  • [31] Using Siamese BiLSTM Models for Identifying Text Semantic Similarity
    Fradelos, Georgios
    Perikos, Isidoros
    Hatzilygeroudis, Ioannis
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS. AIAI 2023 IFIP WG 12.5 INTERNATIONAL WORKSHOPS, 2023, 677 : 381 - 392
  • [32] MEASURING SHORT TEXT SEMANTIC SIMILARITY USING MULTIPLE MEASUREMENTS
    Zhu, Tian-Tian
    Lan, Man
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 808 - 813
  • [33] Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus
    Atoum, Issa
    Otoom, Ahmed
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) : 124 - 130
  • [34] Enhancing Similarity Based Query Searching Performance Using Self Organized Semantic Overlay Networks
    Jayashree, G.
    Perumal, V
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND SYSTEMS (ICCCS'14), 2014, : 57 - 62
  • [35] Clustering Source Code Elements by Semantic Similarity Using Wikipedia
    Schindler, Mirco
    Fox, Oliver
    Rausch, Andreas
    [J]. 2015 IEEE/ACM FOURTH INTERNATIONAL WORKSHOP ON REALIZING ARTIFICIAL INTELLIGENCE SYNERGIES IN SOFTWARE ENGINEERING (RAISE 2015), 2015, : 13 - 18
  • [36] Semantic Textual Similarity in Bengali Text
    Shajalal, Md
    Aono, Masaki
    [J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
  • [37] Text Similarity Based on Semantic Analysis
    Wang, Junli
    Zhou, Qing
    Sun, Guobao
    [J]. PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2016), 2016, 133 : 303 - 307
  • [38] A COMBINED MEASURE FOR TEXT SEMANTIC SIMILARITY
    Li, Hao-Di
    Chen, Qing-Cai
    Wang, Xiao-Long
    [J]. PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1869 - 1873
  • [39] Semantic Based Text Similarity Computation
    Liu, Yaqi
    Li, Zhijiang
    [J]. ADVANCED GRAPHIC COMMUNICATIONS AND MEDIA TECHNOLOGIES, 2017, 417 : 343 - 348
  • [40] An Approach to Semantic Text Similarity Computing
    Akermi, Imen
    Faiz, Rim
    [J]. MODERN TRENDS AND TECHNIQUES IN COMPUTER SCIENCE (CSOC 2014), 2014, 285 : 383 - 393