A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

被引:0
|
作者
Vinay Kumar Kotte
Srinivasan Rajavelu
Elijah Blessing Rajsingh
机构
[1] Karunya Institute of Technology and Sciences (Deemed to be university),Department of CSE
[2] Kakatiya Institute of Technology and Science,Department of CSE
[3] Karunya Institute of Technology and Sciences (Deemed to be university),undefined
来源
Foundations of Science | 2020年 / 25卷
关键词
Classification; Clustering; Dimensionality; Feature selection; Feature reduction;
D O I
暂无
中图分类号
学科分类号
摘要
Text document classification and clustering is an important learning task which fits to both data mining and machine learning areas. The learning task throws several challenges when it is required to process high dimensional text documents. Word distribution in text documents plays a very key role in learning process. Research related to high dimensional text document classification and clustering is usually limited to application of traditional distance functions and most of the research contributions in the existing literature did not consider the word distribution in documents. In this research, we propose a novel similarity function for feature pattern clustering and high dimensional text classification. The similarity function proposed is used to carry supervised learning based dimensionality reduction. The important feature of this work is that the word distribution before and after dimensionality reduction is the same. Experiment results prove the proposed approach achieves dimensionality reduction, retains the word distribution and obtained better classification accuracies compared to other measures.
引用
收藏
页码:1077 / 1094
页数:17
相关论文
共 50 条
  • [1] A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification
    Kotte, Vinay Kumar
    Rajavelu, Srinivasan
    Rajsingh, Elijah Blessing
    [J]. FOUNDATIONS OF SCIENCE, 2020, 25 (04) : 1077 - 1094
  • [2] A Similarity Measure for Text Classification and Clustering
    Lin, Yung-Shen
    Jiang, Jung-Yi
    Lee, Shie-Jue
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (07) : 1575 - 1590
  • [3] Text Document Classification and Pattern Recognition
    Wu, Qin
    Fuller, Eddie
    Zhang, Cun-Quan
    [J]. 2009 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, 2009, : 405 - 410
  • [4] Efficient text document clustering with new similarity measures
    Lakshmi R.
    Baskar S.
    [J]. International Journal of Business Intelligence and Data Mining, 2021, 18 (01) : 109 - 126
  • [5] An Intelligent Similarity Measure for Effective Text Document Clustering
    Aishwarya, M. L.
    Selvi, K.
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTING TECHNOLOGIES AND INTELLIGENT DATA ENGINEERING (ICCTIDE'16), 2016,
  • [6] An Improved Similarity Measure for Text Clustering and Classification
    Reddy, G. Suresh
    Kanth, T. V. Rajini
    Rao, A. Ananda
    [J]. ADVANCED SCIENCE LETTERS, 2015, 21 (11) : 3583 - 3590
  • [7] A Comment on "A Similarity Measure for Text Classification and Clustering"
    Nagwani, Naresh Kumar
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (09) : 2589 - 2590
  • [8] Similarity-Based Synthetic Document Representations for Meta-Feature Generation in Text Classification
    Canuto, Sergio
    Salles, Thiago
    Rosa, Thierson C.
    Goncalves, Marcos A.
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 355 - 364
  • [9] Document classification: An approach using feature clustering
    Harish, B.S.
    Udayasri, B.
    [J]. Advances in Intelligent Systems and Computing, 2014, 235 : 163 - 173
  • [10] A New Similarity Measure for Document Classification and Text Mining
    Eminagaoglu, Mete
    Goksen, Yilmaz
    [J]. ECONOMIES OF THE BALKAN AND EASTERN EUROPEAN COUNTRIES, 2020, : 353 - 366