Nonparametric method of topic identification using granularity concept and graph-based modeling

被引:4
|
作者
Ganguli, Isha [1 ]
Sil, Jaya [1 ]
Sengupta, Nandita [2 ]
机构
[1] Indian Inst Engn Sci & Technol, Dept Comp Sci & Technol, Sibpur, Howrah, India
[2] Univ Coll Bahrain, Dept Informat Technol, Janabiyah, Bahrain
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期
关键词
Granularity; Point-wise mutual information; Graph-based modeling; Hierarchical structure; Computationally efficient algorithm; DOCUMENT; CLASSIFICATION;
D O I
10.1007/s00521-020-05662-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper aims to classify the large unstructured documents into different topics without involving huge computational resources and a priori knowledge. The concept of granularity is employed here to extract contextual information from the documents by generating granules of words (GoWs), hierarchically. The proposed granularity-based word grouping (GBWG) algorithm in a computationally efficient way group the words at different layers by using co-occurrence measure between the words of different granules. The GBWG algorithm terminates when no new GoW is generated at any layer of the hierarchical structure. Thus multiple GoWs are obtained, each of which contains contextually related words, representing different topics. However, the GoWs may contain common words and creating ambiguity in topic identification. Louvain graph clustering algorithm has been employed to automatically identify the topics, containing unique words by using mutual information as an association measure between the words (nodes) of each GoW. A test document is classified into a particular topic based on the probability of its unique words belong to different topics. The performance of the proposed method has been compared with other unsupervised, semi-supervised, and supervised topic modeling algorithms. Experimentally, it has been shown that the proposed method is comparable or better than the state-of-the-art topic modeling algorithms which further statistically verified with the Wilcoxon Rank-sum Test.
引用
收藏
页码:1055 / 1075
页数:21
相关论文
共 50 条
  • [31] Graph-based methods for Significant Concept Selection
    Karim, Gasmi
    Mouna, Torjmen-Khemakhem
    Lynda, Tamine
    Maher, Ben Jemaa
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 19TH ANNUAL CONFERENCE, KES-2015, 2015, 60 : 488 - 497
  • [32] Graph-based local concept coordinate factorization
    Ping Li
    Jiajun Bu
    Lijun Zhang
    Chun Chen
    Knowledge and Information Systems, 2015, 43 : 103 - 126
  • [33] Graph-based local concept coordinate factorization
    Li, Ping
    Bu, Jiajun
    Zhang, Lijun
    Chen, Chun
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 43 (01) : 103 - 126
  • [34] TOPIC MODELING BASED ON ATTRIBUTED GRAPH
    Zhang Lidan
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [35] User modeling by graph-based induction
    Yoshida, K
    DESIGN OF COMPUTING SYSTEMS: SOCIAL AND ERGONOMIC CONSIDERATIONS, 1997, 21 : 23 - 26
  • [36] Graph-based Information Modeling for ICPS
    Biskupovic, Angel
    Nunez, Felipe
    2022 IEEE 20TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2022, : 47 - 52
  • [37] Modeling an Electrolyzer in a Graph-Based Framework
    Nguyen, Buu-Van
    Romate, Johan
    Vuik, Cornelis
    ENERGIES, 2025, 18 (03)
  • [38] Evaluating Graph-based Modeling Languages
    Grabinger, Lisa
    Hauser, Florian
    Mottok, Juergen
    PROCEEDINGS OF THE 5TH EUROPEAN CONFERENCE ON SOFTWARE ENGINEERING EDUCATION, ECSEE 2023, 2023, : 120 - 129
  • [39] GRAPH-BASED METHOD BASED ON GAUSSIAN MIXTURE MODELING TO CLASSIFY AGRICULTURAL LANDS
    Ok, Ali Ozgun
    Ok, Asli Ozdarici
    Schindler, Konrad
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 425 - 428
  • [40] A New Coin Segmentation and Graph-Based Identification Method for Numismatic Application
    Pan, Xingyu
    Puritat, Kitti
    Tougne, Laure
    ADVANCES IN VISUAL COMPUTING (ISVC 2014), PT II, 2014, 8888 : 185 - 195