On ontology-driven document clustering using core semantic features

被引:0
|
作者
Samah Fodeh
Bill Punch
Pang-Ning Tan
机构
[1] Yale University,
[2] Michigan State University,undefined
来源
关键词
Clustering; Information gain; Semantic features; Ontology; Dimensionality reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Incorporating semantic knowledge from an ontology into document clustering is an important but challenging problem. While numerous methods have been developed, the value of using such an ontology is still not clear. We show in this paper that an ontology can be used to greatly reduce the number of features needed to do document clustering. Our hypothesis is that polysemous and synonymous nouns are both relatively prevalent and fundamentally important for document cluster formation. We show that nouns can be efficiently identified in documents and that this alone provides improved clustering. We next show the importance of the polysemous and synonymous nouns in clustering and develop a unique approach that allows us to measure the information gain in disambiguating these nouns in an unsupervised learning setting. In so doing, we can identify a core subset of semantic features that represent a text corpus. Empirical results show that by using core semantic features for clustering, one can reduce the number of features by 90% or more and still produce clusters that capture the main themes in a text corpus.
引用
收藏
页码:395 / 421
页数:26
相关论文
共 50 条
  • [31] Ontology-Driven Co-clustering of Gene Expression Data
    Cordero, Francesca
    Pensa, Ruggero G.
    Visconti, Alessia
    Ienco, Dino
    Botta, Marco
    [J]. AI (ASTERISK) IA 2009: EMERGENT PERSPECTIVES IN ARTIFICIAL INTELLIGENCE, 2009, 5883 : 426 - +
  • [32] An Ontology-driven Document Retrieval Strategy for Organizational Knowledge Management Systems
    Toledo, Carlos M.
    Ale, Mariel A.
    Chiotti, Omar
    Galli, Maria R.
    [J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2011, 281 : 21 - 34
  • [33] GRAPH: A Domain Ontology-driven Semantic Graph Auto Extraction System
    Zhou, Chunying
    Chen, Huajun
    Tao, Jinhuo
    [J]. APPLIED MATHEMATICS & INFORMATION SCIENCES, 2011, 5 (02): : 9 - 16
  • [34] Ontology-driven question answering system with semantic web services support
    Gorenjak, Borut
    Ferme, Marko
    Ojstersek, Milan
    [J]. ADVANCES IN COMMUNICATIONS, COMPUTERS, SYSTEMS, CIRCUITS AND DEVICES, 2010, : 199 - +
  • [35] Ontology-driven semantic unified modelling for concurrent activity recognition (OSCAR)
    Safyan, Muhammad
    Ul Qayyum, Zia
    Sarwar, Sohail
    Garcia-Castro, Raul
    Ahmed, Mehtab
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (02) : 2073 - 2104
  • [36] Ontology-driven semantic unified modelling for concurrent activity recognition (OSCAR)
    Muhammad Safyan
    Zia Ul Qayyum
    Sohail Sarwar
    Raúl García-Castro
    Mehtab Ahmed
    [J]. Multimedia Tools and Applications, 2019, 78 : 2073 - 2104
  • [37] Ontology-Driven Semantic Enrichment Framework for Open Data Value Creation
    Sebubi O.
    Zlotnikova I.
    Hlomani H.
    [J]. Data Science Journal, 2023, 22 (01)
  • [38] Ontology-driven semantic ranking for natural language disambiguation in the OntoNL framework
    Karanastasi, Anastasia
    Christodoulakis, Stavros
    [J]. SEMANTIC WEB: RESEARCH AND APPLICATIONS, PROCEEDINGS, 2007, 4519 : 443 - +
  • [39] An ontology-driven framework towards building enterprise semantic information layer
    Song, Fuqi
    Zacharewicz, Gregory
    Chen, David
    [J]. ADVANCED ENGINEERING INFORMATICS, 2013, 27 (01) : 38 - 50
  • [40] An ontology-driven framework for the management of semantic metadata describing audiovisual information
    Tsinaraki, C
    Fatourou, E
    Christodoulakis, S
    [J]. ADVANCED INFORMATION SYSTEMS ENGINEERING, PROCEEDINGS, 2003, 2681 : 340 - 356