An Efficient Productive Feature Selection and Document Clustering (PFS-DocC) Model for Document Clustering Document Clustering using PFS-DocC Model

被引:0
|
作者
Pitchandi, Perumal [1 ]
机构
[1] Sri Ramakrishna Engn Coll, Dept Comp Sci & Engn, Coimbatore, Tamil Nadu, India
关键词
Benchmark standards; document clustering; productive feature selection; multiple clustering; web applications; RANKING; FRAMEWORK;
D O I
10.14569/IJACSA.2022.0130415
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In mining, document clustering pretends to diminish the document size by constructing the clustering model which is extremely essential in various web-based applications. Over the past few decades, various mining approaches are analysed and evaluated to enhance the process of document clustering to attain better results; however, in most cases, the documents are messed up and degrade the performance by reducing the level of accuracy. The data instances need to be organized and a productive summary have to be generated for all clusters. The summary or the description of the document should demonstrate the information to the users' devoid of any further analysis and helps in easier scanning of associated clusters. It is performed by identifying the relevant and most influencing features to generate the cluster. This work provides a novel approach known as Productive Feature Selection and Document Clustering (PFS-DocC) model. Initially, the productive features are selected from the input dataset DUC2004 which is a benchmark dataset. Next, the document clustering model is attempted for single and multiple clusters where the generated output has to be more extractive, generic, and clustering model. This model provides more appropriate and suitable summaries which is well-suited for web-based applications. The experimentation is carried out in online available benchmark dataset and the evaluation shows that the proposed PFS-DocC model gives superior outcomes with higher ROUGE score.
引用
收藏
页码:125 / 133
页数:9
相关论文
共 50 条
  • [1] Feature selection and document clustering
    Dhillon, I
    Kogan, J
    Nicholas, C
    [J]. SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 73 - 100
  • [2] LDA Based Feature Selection for Document Clustering
    Kumar, B. Shravan
    Ravi, Vadlamani
    [J]. COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, : 125 - 130
  • [3] A Feature Selection for Korean Web Document Clustering
    Park, Heum
    Kim, Young-Gi
    Kwon, Hyuk-Chul
    [J]. IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 3, 2004, : 2650 - 2654
  • [4] Model selection in unsupervised learning with applications to document clustering
    Vaithyanathan, S
    Dom, B
    [J]. MACHINE LEARNING, PROCEEDINGS, 1999, : 433 - 443
  • [5] An Ontology Based Model for Document Clustering
    Sridevi, U.
    Nagaveni, N.
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2011, 7 (03) : 54 - 69
  • [6] A model to enhance XML document clustering
    Yang, JW
    Chen, X
    [J]. COMPUTER SCIENCE AND TECHNOLOGY IN NEW CENTURY, 2001, : 539 - 543
  • [7] A Similarity Rough Set Model for Document Representation and Document Clustering
    Nguyen Chi Thanh
    Yamada, Koichi
    Unehara, Muneyuki
    [J]. JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2011, 15 (02) : 125 - 133
  • [8] Projections for efficient document clustering
    Schutze, H
    Silverstein, C
    [J]. PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1997, : 74 - 81
  • [9] Document classification: An approach using feature clustering
    Harish, B.S.
    Udayasri, B.
    [J]. Advances in Intelligent Systems and Computing, 2014, 235 : 163 - 173
  • [10] Sampling and feature selection in a genetic algorithm for document clustering
    Casillas, A
    de Lena, MTG
    Martínez, R
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 601 - 612