Ontology Based Document Clustering - An Efficient Hybrid Approach

被引:0
|
作者
Jasila, E. K. [1 ]
Saleena, N. [1 ]
Nazeer, Abdul K. A. [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, Calicut, Kerala, India
关键词
Document clustering; Text clustering; Ontology; WordNet; Red Black Tree;
D O I
10.1109/iacc48062.2019.8971594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent research results show that ontology can be used to improve the accuracy of document clustering. Previous studies mainly focused on the preprocessing part of text document using ontology. In this paper, we propose a hybrid approach, concentrating on both the preprocessing task as well as the clustering algorithm. This is with an objective of reducing the number of features and execution time, eliminate synonymous problems and enhance the accuracy of clustering. Cosine similarity is used as similarity measure. The preprocessing part uses a WordNet Ontology based feature extraction method. In clustering, the initial centroids are found by applying the Red Black Tree based sorting method. The data points are allocated to the suitable clusters using a novel approach, by maintaining the path of similarity between data points and nearest cluster centroids. Experimental results on some of the existing clustering algorithms with cosine similarity are compared with our novel clustering technique. Results show that the proposed hybrid approach executes better on the Newsgroup dataset with considerable improvements in dimensionality reduction, running time, and accuracy.
引用
收藏
页码:153 / 157
页数:5
相关论文
共 50 条
  • [21] Study of ontology or thesaurus based document clustering and information retrieval
    Bharathi, G.
    Venkatesan, D.
    [J]. Journal of Theoretical and Applied Information Technology, 2012, 40 (01) : 55 - 61
  • [22] An Efficient Document Clustering Approach for Devising Semantic Clusters
    Jasila, E. K.
    Saleena, N.
    Abdul Nazeer, K. A.
    [J]. CYBERNETICS AND SYSTEMS, 2023,
  • [23] Fuzzy Clustering based Approach for Ontology Alignment
    Idoudi, Rihab
    Ettabaa, Karim Saheb
    Hamrouni, Kamel
    Solaiman, Basel
    [J]. PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS, VOL 1 (ICEIS), 2016, : 594 - 599
  • [24] A Clustering-Based Approach to Ontology Alignment
    Duan, Songyun
    Fokoue, Achille
    Srinivas, Kavitha
    Byrne, Brian
    [J]. SEMANTIC WEB - ISWC 2011, PT I, 2011, 7031 : 146 - +
  • [25] Analysis of ontology based approach for clustering tasks
    Grabusts, Peter
    [J]. AICT 2013: APPLIED INFORMATION AND COMMUNICATION TECHNOLOGIES, 2013, : 10 - 17
  • [26] Performance Evaluation of Semantic Based and Ontology Based Text Document Clustering Techniques
    Punitha, S. C.
    Punithavalli, M.
    [J]. INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY AND SYSTEM DESIGN 2011, 2012, 30 : 100 - 106
  • [27] Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology
    Selvalakshmi, B.
    Subramaniam, M.
    Sathiyasekar, K.
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (09): : 3102 - 3119
  • [28] Document Clustering Using an Ontology-Based Vector Space Model
    Costa, Ruben
    Lima, Celson
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2015, 5 (03) : 39 - 60
  • [29] An Improvised Sub-Document Based Framework for Efficient Document Clustering
    Memon, Muhammad Qasim
    He, Jingsha
    Lu, Yu
    Zhu, Nafei
    Memon, Aasma
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2019, 20 (04): : 1191 - 1203
  • [30] Efficient phrase-based document indexing for web document clustering
    Hammouda, KM
    Kamel, MS
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (10) : 1279 - 1296