Semantic Document Clustering Using a Similarity Graph

被引:9
|
作者
Stanchev, Lubomir [1 ]
机构
[1] Calif Polytech State Univ San Luis Obispo, Dept Comp Sci, San Luis Obispo, CA 93407 USA
关键词
MODEL;
D O I
10.1109/ICSC.2016.8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document clustering addresses the problem of identifying groups of similar documents without human supervision. Unlike most existing solutions that perform document clustering based on keywords matching, we propose an algorithm that considers the meaning of the terms in the documents. For example, a document that contains the words "dog" and "cat" multiple times may be placed in the same category as a document that contains the word "pet" even if the two documents share only noise words in common. Our semantic clustering algorithm is based on a similarity graph that stores the degree of semantic relationship between terms (extracted from WordNet), where a term can be a word or a phrase. We experimentally validate our algorithm on the Reuters-21578 benchmark, which contains 11, 362 newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We show that the second approach produces higher precision and recall, which means that this approach matches closer the results of the human study.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [1] Fine-Tuning an Algorithm for Semantic Document Clustering Using a Similarity Graph
    Stanchev, Lubomir
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2016, 10 (04) : 527 - 555
  • [2] Clustering Blogs Using Document Context Similarity and Spectral Graph Partitioning
    Ayyasamy, Ramesh Kumar
    Alhashmi, Saadat M.
    Eu-Gene, Siew
    Tahayna, Bashar
    [J]. KNOWLEDGE ENGINEERING AND MANAGEMENT, 2011, 123 : 475 - +
  • [3] Document clustering based on similarity of subjects using integrated subject graph
    Nakada, M
    Osana, Y
    [J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2006, : 410 - +
  • [4] Designing a Semantic Similarity Measure for Biomedical Document Clustering
    Logeswari, S.
    Kandhasamy, Premalatha
    [J]. JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2015, 5 (06) : 1163 - 1170
  • [5] WordNet and Semantic Similarity based Approach for Document Clustering
    Desai, Sneha S.
    Laxminarayana, J. A.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 312 - 317
  • [6] Semantic-Based Text Document Clustering Using Cognitive Semantic Learning and Graph Theory
    Ali, Ismael
    Melton, Austin
    [J]. 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 243 - 247
  • [7] Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity
    Zhu, Shanfeng
    Zeng, Jia
    Mamitsuka, Hiroshi
    [J]. BIOINFORMATICS, 2009, 25 (15) : 1944 - 1951
  • [8] Document Similarity Detection using Semantic Social Network Analysis on RDF Citation Graph
    Mahmood, Qamar
    Qadir, Muhammad Abdul
    Afzal, Muhammad Tanvir
    [J]. 2013 IEEE 9TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET 2013), 2013, : 108 - 113
  • [9] An Improved Genetic Algorithm for Document Clustering with Semantic Similarity Measure
    Song, Wei
    Park, Soon Cheol
    [J]. ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 1, PROCEEDINGS, 2008, : 536 - 540
  • [10] Semantic Search Using a Similarity Graph
    Stanchev, Lubomir
    [J]. 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2015, : 93 - 100