On ontology-driven document clustering using core semantic features

被引:0
|
作者
Samah Fodeh
Bill Punch
Pang-Ning Tan
机构
[1] Yale University,
[2] Michigan State University,undefined
来源
关键词
Clustering; Information gain; Semantic features; Ontology; Dimensionality reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Incorporating semantic knowledge from an ontology into document clustering is an important but challenging problem. While numerous methods have been developed, the value of using such an ontology is still not clear. We show in this paper that an ontology can be used to greatly reduce the number of features needed to do document clustering. Our hypothesis is that polysemous and synonymous nouns are both relatively prevalent and fundamentally important for document cluster formation. We show that nouns can be efficiently identified in documents and that this alone provides improved clustering. We next show the importance of the polysemous and synonymous nouns in clustering and develop a unique approach that allows us to measure the information gain in disambiguating these nouns in an unsupervised learning setting. In so doing, we can identify a core subset of semantic features that represent a text corpus. Empirical results show that by using core semantic features for clustering, one can reduce the number of features by 90% or more and still produce clusters that capture the main themes in a text corpus.
引用
收藏
页码:395 / 421
页数:26
相关论文
共 50 条
  • [1] On ontology-driven document clustering using core semantic features
    Fodeh, Samah
    Punch, Bill
    Tan, Pang-Ning
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 28 (02) : 395 - 421
  • [2] Ontology-driven semantic mapping
    Beneventano, Domenico
    Dahlem, Nikolai
    El Haoum, Sabina
    Hahn, Axel
    Montanari, Daniele
    Reinelt, Matthias
    [J]. ENTERPRISE INTEROPERABILITY III: NEW CHALLENGES AND INDUSTRIAL APPROACHES, 2008, : 329 - +
  • [3] ONTOLOGY-DRIVEN CONCEPTUAL DOCUMENT CLASSIFICATION
    Pavlovic-Lazetic, Gordana
    Graovac, Jelena
    [J]. KDIR 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2010, : 383 - 386
  • [4] Ontology-Driven Semantic Digital Library
    Noah, Shahrul Azman
    Alias, Nor Afni Raziah
    Osman, Nurul Aida
    Abdullah, Zuraidah
    Omar, Nazlia
    Yahya, Yazrina
    Yusof, Maryati Mohd
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 141 - 150
  • [5] Ontology-driven management of semantic spaces
    Krummenacher, Reto
    [J]. SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 926 - 930
  • [6] Ontology-driven visualization system for semantic searching
    Inay Ha
    Kyeong-Jin Oh
    Myung-Duk Hong
    Yeon-Ho Lee
    Ahmad Nurzid Rosli
    Geun-Sik Jo
    [J]. Multimedia Tools and Applications, 2014, 71 : 947 - 965
  • [7] Ontology-driven Semantic Search for Requirement Engineering
    [J]. 1600, John Wiley and Sons Inc (24):
  • [8] Ontology-driven visualization system for semantic searching
    Ha, Inay
    Oh, Kyeong-Jin
    Hong, Myung-Duk
    Lee, Yeon-Ho
    Rosli, Ahmad Nurzid
    Jo, Geun-Sik
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (02) : 947 - 965
  • [9] Semantic document clustering based on ontology
    Wang, Ying
    Peng, Tao
    Zuo, Wanli
    He, Fengling
    Wang, Dong
    [J]. Journal of Computational Information Systems, 2009, 5 (03): : 1437 - 1444
  • [10] Ontology-driven semantic video analysis using Visual Information Objects
    Papadopoulos, Georgios Th.
    Mezaris, Vasileios
    Kompatsiaris, Loarmis
    Strintzis, Michael G.
    [J]. SEMANTIC MULTIMEDIA, PROCEEDINGS, 2007, 4816 : 56 - +