Ontology Based Document Clustering - An Efficient Hybrid Approach

被引:0
|
作者
Jasila, E. K. [1 ]
Saleena, N. [1 ]
Nazeer, Abdul K. A. [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, Calicut, Kerala, India
关键词
Document clustering; Text clustering; Ontology; WordNet; Red Black Tree;
D O I
10.1109/iacc48062.2019.8971594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent research results show that ontology can be used to improve the accuracy of document clustering. Previous studies mainly focused on the preprocessing part of text document using ontology. In this paper, we propose a hybrid approach, concentrating on both the preprocessing task as well as the clustering algorithm. This is with an objective of reducing the number of features and execution time, eliminate synonymous problems and enhance the accuracy of clustering. Cosine similarity is used as similarity measure. The preprocessing part uses a WordNet Ontology based feature extraction method. In clustering, the initial centroids are found by applying the Red Black Tree based sorting method. The data points are allocated to the suitable clusters using a novel approach, by maintaining the path of similarity between data points and nearest cluster centroids. Experimental results on some of the existing clustering algorithms with cosine similarity are compared with our novel clustering technique. Results show that the proposed hybrid approach executes better on the Newsgroup dataset with considerable improvements in dimensionality reduction, running time, and accuracy.
引用
收藏
页码:153 / 157
页数:5
相关论文
共 50 条
  • [1] An Ontology Based Model for Document Clustering
    Sridevi, U.
    Nagaveni, N.
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2011, 7 (03) : 54 - 69
  • [2] Semantic document clustering based on ontology
    Wang, Ying
    Peng, Tao
    Zuo, Wanli
    He, Fengling
    Wang, Dong
    [J]. Journal of Computational Information Systems, 2009, 5 (03): : 1437 - 1444
  • [3] SUPPORTING DOCUMENT-CATEGORY MANAGEMENT: AN ONTOLOGY-BASED DOCUMENT CLUSTERING APPROACH
    Lee, Yen-Hsien
    Tu, Ching-Yi
    [J]. 12TH PACIFIC ASIA CONFERENCE ON INFORMATION SYSTEMS (PACIS 2008), 2008, : 1457 - 1468
  • [4] Partition Document Clustering using Ontology Approach
    Punitha, S. C.
    Jayasree, R.
    Punithavalli, M.
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS, 2013,
  • [5] A fuzzy document clustering approach based on domain-specified ontology
    Yue, Lin
    Zuo, Wanli
    Peng, Tao
    Wang, Ying
    Han, Xuming
    [J]. DATA & KNOWLEDGE ENGINEERING, 2015, 100 : 148 - 166
  • [6] An Ontology Learning Method Based on Document Clustering
    Wei, Xianmin
    [J]. FRONTIERS OF MANUFACTURING AND DESIGN SCIENCE II, PTS 1-6, 2012, 121-126 : 1911 - 1915
  • [7] Ontology-based text document clustering
    Staab, S
    Hotho, A
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, 2003, : 451 - 452
  • [8] A Text Document Clustering Method Based on Ontology
    Ding, Yi
    Fu, Xian
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II, 2011, 6676 : 199 - 206
  • [9] A Novel Hybrid Clustering Approach Based on Black Hole Algorithm for Document Clustering
    Malik, Fazila
    Khan, Salabat
    Rizwan, Atif
    Atteia, Ghada
    Samee, Nagwan Abdel
    [J]. IEEE Access, 2022, 10 : 97310 - 97326
  • [10] A Novel Hybrid Clustering Approach Based on Black Hole Algorithm for Document Clustering
    Malik, Fazila
    Khan, Salabat
    Rizwan, Atif
    Atteia, Ghada
    Samee, Nagwan Abdel
    [J]. IEEE ACCESS, 2022, 10 : 97310 - 97326