On ontology-driven document clustering using core semantic features

被引:0
|
作者
Samah Fodeh
Bill Punch
Pang-Ning Tan
机构
[1] Yale University,
[2] Michigan State University,undefined
来源
关键词
Clustering; Information gain; Semantic features; Ontology; Dimensionality reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Incorporating semantic knowledge from an ontology into document clustering is an important but challenging problem. While numerous methods have been developed, the value of using such an ontology is still not clear. We show in this paper that an ontology can be used to greatly reduce the number of features needed to do document clustering. Our hypothesis is that polysemous and synonymous nouns are both relatively prevalent and fundamentally important for document cluster formation. We show that nouns can be efficiently identified in documents and that this alone provides improved clustering. We next show the importance of the polysemous and synonymous nouns in clustering and develop a unique approach that allows us to measure the information gain in disambiguating these nouns in an unsupervised learning setting. In so doing, we can identify a core subset of semantic features that represent a text corpus. Empirical results show that by using core semantic features for clustering, one can reduce the number of features by 90% or more and still produce clusters that capture the main themes in a text corpus.
引用
收藏
页码:395 / 421
页数:26
相关论文
共 50 条
  • [21] Ontology-driven web-based semantic similarity
    David Sánchez
    Montserrat Batet
    Aida Valls
    Karina Gibert
    [J]. Journal of Intelligent Information Systems, 2010, 35 : 383 - 413
  • [22] Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach
    Panahiazar, Maryam
    Sheth, Amit P.
    Ranabahu, Ajith
    Vos, Rutger A.
    Leebens-Mack, Jim
    [J]. BMC MEDICAL GENOMICS, 2013, 6
  • [23] Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach
    Maryam Panahiazar
    Amit P Sheth
    Ajith Ranabahu
    Rutger A Vos
    Jim Leebens-Mack
    [J]. BMC Medical Genomics, 6
  • [24] Ontology-Driven Audit Using the REA-Ontology
    Gal, Graham
    Snoeck, Monique
    Laurier, Wim
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING WORKSHOPS, 2021, 423 : 109 - 120
  • [25] Ontology-Driven Semantic Search for Brazilian Portuguese Clinical Notes
    Hasan, Sadid A.
    Zhu, Xianshu
    Liu, Joey
    Barra, Claudia M.
    Oliveira, Lucas
    Farri, Oladimeji
    [J]. MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1022 - 1022
  • [26] An ontology-driven Semantic Speech Recognition system for Security tasks
    Barroso, N.
    de Ipina, K. Lopez
    Ezeiza, A.
    Hernandez, C.
    [J]. 2011 IEEE INTERNATIONAL CARNAHAN CONFERENCE ON SECURITY TECHNOLOGY (ICCST), 2011,
  • [27] Ontology-Driven Semantic Comparison between Geographic Data Sets
    Cadena Martinez, Rodrigo
    Quintero Tellez, Rolando
    Moreno Ibarra, Marco Antonio
    Torres Ruiz, Miguel
    Guzman Lugo, Giovanni
    [J]. COMPUTACION Y SISTEMAS, 2013, 17 (04): : 569 - 581
  • [28] An Ontology-Driven Approach for Semantic Annotation of Documents with Specific Concepts
    Alec, Celine
    Reynaud-Delaitre, Chantal
    Safar, Brigitte
    [J]. SEMANTIC WEB: LATEST ADVANCES AND NEW DOMAINS, 2016, 9678 : 609 - 624
  • [29] Ontology-Driven Knowledge Graph Construction in the Mathematics Semantic Library
    Ataeva, O.M.
    Serebryakov, V.A.
    Tuchkova, N.P.
    [J]. Pattern Recognition and Image Analysis, 2024, 34 (03) : 448 - 455
  • [30] An ontology-driven clustering method for supporting gene expression analysis
    Wang, HY
    Azuaje, F
    Bodenreider, O
    [J]. 18TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2005, : 389 - 394