Nonparametric method of topic identification using granularity concept and graph-based modeling

被引:4
|
作者
Ganguli, Isha [1 ]
Sil, Jaya [1 ]
Sengupta, Nandita [2 ]
机构
[1] Indian Inst Engn Sci & Technol, Dept Comp Sci & Technol, Sibpur, Howrah, India
[2] Univ Coll Bahrain, Dept Informat Technol, Janabiyah, Bahrain
来源
NEURAL COMPUTING & APPLICATIONS | 2023年 / 35卷 / 02期
关键词
Granularity; Point-wise mutual information; Graph-based modeling; Hierarchical structure; Computationally efficient algorithm; DOCUMENT; CLASSIFICATION;
D O I
10.1007/s00521-020-05662-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper aims to classify the large unstructured documents into different topics without involving huge computational resources and a priori knowledge. The concept of granularity is employed here to extract contextual information from the documents by generating granules of words (GoWs), hierarchically. The proposed granularity-based word grouping (GBWG) algorithm in a computationally efficient way group the words at different layers by using co-occurrence measure between the words of different granules. The GBWG algorithm terminates when no new GoW is generated at any layer of the hierarchical structure. Thus multiple GoWs are obtained, each of which contains contextually related words, representing different topics. However, the GoWs may contain common words and creating ambiguity in topic identification. Louvain graph clustering algorithm has been employed to automatically identify the topics, containing unique words by using mutual information as an association measure between the words (nodes) of each GoW. A test document is classified into a particular topic based on the probability of its unique words belong to different topics. The performance of the proposed method has been compared with other unsupervised, semi-supervised, and supervised topic modeling algorithms. Experimentally, it has been shown that the proposed method is comparable or better than the state-of-the-art topic modeling algorithms which further statistically verified with the Wilcoxon Rank-sum Test.
引用
收藏
页码:1055 / 1075
页数:21
相关论文
共 50 条
  • [21] Graph-based Techniques for Topic Classification of Tweets in Spanish
    Cordobes, Hector
    Fernandez Anta, Antonio
    Chiroque, Luis F.
    Perez, Fernando
    Redondo, Teofilo
    Santos, Agustin
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2014, 2 (05): : 31 - 37
  • [22] Knowledge graph-based representation and recommendation for surrogate modeling method
    Wan, Silai
    Wang, Guoxin
    Ming, Zhenjun
    Yan, Yan
    Nellippallil, Anand Balu
    Allen, Janet K.
    Mistree, Farrokh
    ADVANCED ENGINEERING INFORMATICS, 2024, 62
  • [23] Using Graph-Based Indexing to Identify Subject-Shift in Topic Tracking
    Fukumoto, Fumiyo
    Suzuki, Yoshimi
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 392 - 404
  • [24] Key Concept Identification: A Comprehensive Analysis of Frequency and Topical Graph-Based Approaches
    Aman, Muhammad
    Said, Abas Bin Md
    Kadir, Said Jadid Abdul
    Ullah, Israr
    INFORMATION, 2018, 9 (05)
  • [25] Evolutionary truss topology optimization using a graph-based parameterization concept
    M. Giger
    P. Ermanni
    Structural and Multidisciplinary Optimization, 2006, 32 : 313 - 326
  • [26] MEDRank: Using graph-based concept ranking to index biomedical texts
    Herskovic, Jorge R.
    Cohen, Trevor
    Subramanian, Devika
    Iyengar, M. Sriram
    Smith, Jack W.
    Bernstam, Elmer V.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2011, 80 (06) : 431 - 441
  • [27] Evolutionary truss topology optimization using a graph-based parameterization concept
    Giger, M.
    Ermanni, P.
    STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2006, 32 (04) : 313 - 326
  • [28] Fingerprinting Protocol at Bit-level Granularity: A Graph-based Approach using Cell Embedding
    Sang, Yafei
    Zhang, Yongzheng
    2017 IEEE 23RD INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2017, : 266 - 275
  • [29] Modeling urban structures using graph-based spatial patterns
    Dogrusoz, Emel
    Aksoy, Selim
    IGARSS: 2007 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-12: SENSING AND UNDERSTANDING OUR PLANET, 2007, : 4826 - 4829
  • [30] A Concept for Graph-Based LCA Analysis Tool
    Nadoveza, Drazen
    Koukias, Andreas
    Karakoyun, Fatih
    Kiritsis, Dimitris
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS, APMS 2013, PT II, 2013, 415 : 410 - 417