Nonparametric method of topic identification using granularity concept and graph-based modeling

被引:4
|
作者
Ganguli, Isha [1 ]
Sil, Jaya [1 ]
Sengupta, Nandita [2 ]
机构
[1] Indian Inst Engn Sci & Technol, Dept Comp Sci & Technol, Sibpur, Howrah, India
[2] Univ Coll Bahrain, Dept Informat Technol, Janabiyah, Bahrain
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期
关键词
Granularity; Point-wise mutual information; Graph-based modeling; Hierarchical structure; Computationally efficient algorithm; DOCUMENT; CLASSIFICATION;
D O I
10.1007/s00521-020-05662-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper aims to classify the large unstructured documents into different topics without involving huge computational resources and a priori knowledge. The concept of granularity is employed here to extract contextual information from the documents by generating granules of words (GoWs), hierarchically. The proposed granularity-based word grouping (GBWG) algorithm in a computationally efficient way group the words at different layers by using co-occurrence measure between the words of different granules. The GBWG algorithm terminates when no new GoW is generated at any layer of the hierarchical structure. Thus multiple GoWs are obtained, each of which contains contextually related words, representing different topics. However, the GoWs may contain common words and creating ambiguity in topic identification. Louvain graph clustering algorithm has been employed to automatically identify the topics, containing unique words by using mutual information as an association measure between the words (nodes) of each GoW. A test document is classified into a particular topic based on the probability of its unique words belong to different topics. The performance of the proposed method has been compared with other unsupervised, semi-supervised, and supervised topic modeling algorithms. Experimentally, it has been shown that the proposed method is comparable or better than the state-of-the-art topic modeling algorithms which further statistically verified with the Wilcoxon Rank-sum Test.
引用
收藏
页码:1055 / 1075
页数:21
相关论文
共 50 条
  • [11] Automatic Labeling of Topic Models Using Graph-Based Ranking
    He, Dongbin
    Wang, Minjuan
    Khattak, Abdul Mateen
    Zhang, Li
    Gao, Wanlin
    IEEE ACCESS, 2019, 7 : 131593 - 131608
  • [12] Knowledge Graph-Based Core Concept Identification in Learning Resources
    Manrique, Ruben
    Grevisse, Christian
    Marino, Olga
    Rothkugel, Steffen
    SEMANTIC TECHNOLOGY (JIST 2018), 2018, 11341 : 36 - 51
  • [13] GDTM: Graph-based Dynamic Topic Models
    Kambiz Ghoorchian
    Magnus Sahlgren
    Progress in Artificial Intelligence, 2020, 9 : 195 - 207
  • [14] GDTM: Graph-based Dynamic Topic Models
    Ghoorchian, Kambiz
    Sahlgren, Magnus
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2020, 9 (03) : 195 - 207
  • [15] Topic structure mining for document sets using graph-based analysis
    Toda, Hiroyuki
    Kataoka, Ryoji
    Kitagawa, Hiroyuki
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 327 - 337
  • [16] New human identification method using Tietze graph-based feature generation
    Turker Tuncer
    Emrah Aydemir
    Sengul Dogan
    M. Ali Kobat
    M. Cagri Kaya
    Serkan Metin
    Soft Computing, 2021, 25 : 13437 - 13449
  • [17] New human identification method using Tietze graph-based feature generation
    Tuncer, Turker
    Aydemir, Emrah
    Dogan, Sengul
    Kobat, M. Ali
    Kaya, M. Cagri
    Metin, Serkan
    SOFT COMPUTING, 2021, 25 (21) : 13437 - 13449
  • [18] A Systematic Composite Service Design Modeling Method Using Graph-Based Theory
    Elhag, Arafat Abdulgader Mohammed
    Mohamad, Radziah
    Aziz, Muhammad Waqar
    Zeshan, Furkh
    PLOS ONE, 2015, 10 (04):
  • [19] Modeling semistructured data by using graph-based constraints
    Damiani, E
    Oliboni, B
    Quintarelli, E
    Tanca, L
    ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2003: OTM 2003 WORKSHOPS, 2003, 2889 : 20 - 21
  • [20] A Graph-Based Concept Discovery Method for n-Ary Relations
    Abay, Nazmiye Ceren
    Mutlu, Alev
    Karagoz, Pinar
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, 2015, 9263 : 391 - 402