Ontology Based Document Clustering - An Efficient Hybrid Approach

被引:0
|
作者
Jasila, E. K. [1 ]
Saleena, N. [1 ]
Nazeer, Abdul K. A. [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci & Engn, Calicut, Kerala, India
关键词
Document clustering; Text clustering; Ontology; WordNet; Red Black Tree;
D O I
10.1109/iacc48062.2019.8971594
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent research results show that ontology can be used to improve the accuracy of document clustering. Previous studies mainly focused on the preprocessing part of text document using ontology. In this paper, we propose a hybrid approach, concentrating on both the preprocessing task as well as the clustering algorithm. This is with an objective of reducing the number of features and execution time, eliminate synonymous problems and enhance the accuracy of clustering. Cosine similarity is used as similarity measure. The preprocessing part uses a WordNet Ontology based feature extraction method. In clustering, the initial centroids are found by applying the Red Black Tree based sorting method. The data points are allocated to the suitable clusters using a novel approach, by maintaining the path of similarity between data points and nearest cluster centroids. Experimental results on some of the existing clustering algorithms with cosine similarity are compared with our novel clustering technique. Results show that the proposed hybrid approach executes better on the Newsgroup dataset with considerable improvements in dimensionality reduction, running time, and accuracy.
引用
收藏
页码:153 / 157
页数:5
相关论文
共 50 条
  • [31] A novel ant-based clustering approach for document clustering
    He, Yulan
    Hui, Sin Cheung
    Sim, Yongxiang
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 537 - 544
  • [32] An Ontology-Based Reasoning Approach for Document Annotation
    Fontes, Celso Araujo
    Cavalcanti, Maria Claudia
    Moura, Ana Maria de C.
    [J]. 2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 160 - 167
  • [33] Hybrid clustering approach for term partitioning in document data sets
    Dept. of Computer Science and Engineering, GITAM, Visakhapatnam, India
    不详
    不详
    [J]. J. Digit. Inf. Manage., 2008, 3 (272-277): : 272 - 277
  • [34] An approach to document clustering based on system relevance
    Desai, M
    Spink, A
    [J]. ASIST 2004: PROCEEDINGS OF THE 67TH ASIS&T ANNUAL MEETING, VOL 41, 2004: MANAGING AND ENHANCING INFORMATION: CULTURES AND CONFLICTS, 2004, 41 : 256 - 266
  • [35] Document clustering based on semantic smoothing approach
    Liu, Yubao
    Cai, Jiarong
    Yin, Jian
    Huang, Zhilan
    [J]. ADVANCES IN INTELLIGENT WEB MASTERING, 2007, 43 : 217 - +
  • [36] Distributed Document Clustering Analysis Based on a Hybrid Method
    Judith, J. E.
    Jayakumari, J.
    [J]. CHINA COMMUNICATIONS, 2017, 14 (02) : 131 - 142
  • [37] Distributed Document Clustering Analysis Based on a Hybrid Method
    J.E.Judith
    J.Jayakumari
    [J]. China Communications, 2017, 14 (02) : 131 - 142
  • [38] Efficient prediction-based validation for document clustering
    Greene, Derek
    Cunningham, Padraig
    [J]. MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 663 - 670
  • [39] Efficient Incremental Phrase-Based Document Clustering
    Bakr, Ahmad M.
    Yousri, Noha A.
    Ismail, Mohamed A.
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 517 - 520
  • [40] Efficient phrase-based document similarity for clustering
    Chim, Hung
    Deng, Xiaotie
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) : 1217 - 1229