A Novel Graph Based Clustering Approach to Document Topic Modeling

被引:0
|
作者
Chanda, Prateek [1 ]
Das, Asit Kumar [1 ]
机构
[1] Indian Inst Engn Sci & Technol, Dept Comp Sci & Technol, Sibpur, Howrah, India
关键词
Text mining; Document clustering; Graph based clustering; Importance factor; Newsgroup20; dataset;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering is the task of assigning a set of objects into groups so that the objects within the same cluster are more similar to each other than to those in other clusters based on some similarity measures. Clustering of documents is an important task in text mining based on their research topics. In this field, cluster analysis is the task of grouping a set of documents in such a way that the documents in the same cluster have similar topic and documents of different clusters have different topics. The proposed method introduces a novel graph based clustering method which uses the importance factor of a document based on a better mathematical approach than well known classical methods. Document with the maximum importance factor in a cluster is considered as the centroid of the cluster. Publicly available synthetic dataset is used to evaluate the performance of the proposed algorithm and the method is compared with some traditional graph based methods to demonstrate its accuracy.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Graph Clustering based Topic Modeling using Feature Learning Approach
    Ganguli, Isha
    Sil, Jaya
    [J]. PROCEEDINGS OF THE WORKSHOP PROGRAM OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING (ICDCN'18), 2018,
  • [2] A topic modeling based approach to novel document automatic summarization
    Wu, Zongda
    Lei, Li
    Li, Guiling
    Huang, Hui
    Zheng, Chengren
    Chen, Enhong
    Xu, Guandong
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 84 : 12 - 23
  • [3] A Novel Approach of Neural Topic Modelling for Document Clustering
    Subramani, Sandhya
    Sridhar, Vaishnavi
    Shetty, Kaushal
    [J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 2169 - 2173
  • [4] Novel Similarity Measure for Document Clustering Based on Topic Phrases
    ELdesoky, A. E.
    Saleh, M.
    Sakr, N. A.
    [J]. ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
  • [5] Soft document clustering using a novel graph covering approach
    Doerpinghaus, Jens
    Schaaf, Sebastian
    Jacobs, Marc
    [J]. BIODATA MINING, 2018, 11
  • [6] Soft document clustering using a novel graph covering approach
    Jens Dörpinghaus
    Sebastian Schaaf
    Marc Jacobs
    [J]. BioData Mining, 11
  • [7] A novel topic clustering algorithm based on graph neural network for question topic diversity
    Wu, Yongliang
    Wang, Xuejun
    Zhao, Wenbin
    Lv, Xiaofeng
    [J]. INFORMATION SCIENCES, 2023, 629 : 685 - 702
  • [8] Graph Clustering Based Size Varying Rules for Lifelong Topic Modeling
    Khan, M. Taimoor
    Khalid, Shehzad
    Aziz, Furqan
    [J]. ICBRA 2018: PROCEEDINGS OF 2018 5TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS RESEARCH AND APPLICATIONS, 2018, : 73 - 77
  • [9] A novel ant-based clustering approach for document clustering
    He, Yulan
    Hui, Sin Cheung
    Sim, Yongxiang
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2006, 4182 : 537 - 544
  • [10] Neural Topic Modeling by Incorporating Document Relationship Graph
    Zhou, Deyu
    Hu, Xuemeng
    Wang, Rui
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3790 - 3796