Nonparametric method of topic identification using granularity concept and graph-based modeling

被引:4
|
作者
Ganguli, Isha [1 ]
Sil, Jaya [1 ]
Sengupta, Nandita [2 ]
机构
[1] Indian Inst Engn Sci & Technol, Dept Comp Sci & Technol, Sibpur, Howrah, India
[2] Univ Coll Bahrain, Dept Informat Technol, Janabiyah, Bahrain
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期
关键词
Granularity; Point-wise mutual information; Graph-based modeling; Hierarchical structure; Computationally efficient algorithm; DOCUMENT; CLASSIFICATION;
D O I
10.1007/s00521-020-05662-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper aims to classify the large unstructured documents into different topics without involving huge computational resources and a priori knowledge. The concept of granularity is employed here to extract contextual information from the documents by generating granules of words (GoWs), hierarchically. The proposed granularity-based word grouping (GBWG) algorithm in a computationally efficient way group the words at different layers by using co-occurrence measure between the words of different granules. The GBWG algorithm terminates when no new GoW is generated at any layer of the hierarchical structure. Thus multiple GoWs are obtained, each of which contains contextually related words, representing different topics. However, the GoWs may contain common words and creating ambiguity in topic identification. Louvain graph clustering algorithm has been employed to automatically identify the topics, containing unique words by using mutual information as an association measure between the words (nodes) of each GoW. A test document is classified into a particular topic based on the probability of its unique words belong to different topics. The performance of the proposed method has been compared with other unsupervised, semi-supervised, and supervised topic modeling algorithms. Experimentally, it has been shown that the proposed method is comparable or better than the state-of-the-art topic modeling algorithms which further statistically verified with the Wilcoxon Rank-sum Test.
引用
收藏
页码:1055 / 1075
页数:21
相关论文
共 50 条
  • [1] Nonparametric method of topic identification using granularity concept and graph-based modeling
    Isha Ganguli
    Jaya Sil
    Nandita Sengupta
    Neural Computing and Applications, 2023, 35 : 1055 - 1075
  • [2] Non-parametric Method of Topic Identification Using Granularity Concept and Graph-Based Modeling
    Ganguli, Isha
    Sil, Jaya
    Sengupta, Nandita
    2019 6TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2019), 2019, : 78 - 82
  • [3] Graph-Based Hybrid Recommendation Using Random Walk and Topic Modeling
    Zheng, Hai-Tao
    Yan, Yang-Hui
    Zhou, Ying-Min
    WEB TECHNOLOGIES AND APPLICATIONS (APWEB 2015), 2015, 9313 : 573 - 585
  • [4] Graph-based term weighting scheme for topic modeling
    Bekoulis, Giannis
    Rousseau, Francois
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 1039 - 1044
  • [5] A new graph-based extractive text summarization using keywords or topic modeling
    Ramesh Chandra Belwal
    Sawan Rai
    Atul Gupta
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 8975 - 8990
  • [6] A new graph-based extractive text summarization using keywords or topic modeling
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (10) : 8975 - 8990
  • [7] A Graph-based Topic Modeling Approach to Detection of Irrelevant Citations
    Phu Pham
    Hieu Le
    Nguyen Thanh Tam
    Quang-Dieu Tran
    VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 197 - 216
  • [8] Topic Modeling Revisited: A Document Graph-based Neural Network Perspective
    Shen, Dazhong
    Qin, Chuan
    Wang, Chao
    Dong, Zheng
    Zhu, Hengshu
    Xiong, Hui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [9] Graph-Based Multimodal Topic Modeling With Word Relations and Object Relations
    Zhu, Bingshan
    Cai, Yi
    Wang, Jiexin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8210 - 8225
  • [10] GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts
    Thies, Jason
    Stappen, Lukas
    Hagerer, Gerhard
    Schuller, Bjorn W.
    Groh, Georg
    2021 IEEE SEVENTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2021), 2021, : 1 - 8