Hierarchical Topic Modeling for Urdu Text Articles

被引:2
|
作者
Rehman, Anwar Ur [1 ]
Khan, Ali Haider [2 ]
Aftab, Mustansar [3 ]
Rehman, Zobia [1 ]
Shah, Munam Ali [1 ]
机构
[1] Comsats Univ Islamabad, Dept Comp Sci, Islamabad, Pakistan
[2] Univ Management & Technol, Dept Comp Sci, Lahore, Pakistan
[3] Natl Coll Business Adm & Econ, Lahore, Pakistan
关键词
Hierarchal Topic model; Hierarchal LDA; Urdu Topic Model; Urdu Hierarchal LDA; Natural Language Processing; Gibbs sampling;
D O I
10.23919/iconac.2019.8895047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital text is increasing rapidly on the Internet with the excessive use of social media. For this reason, it is very challenging to extract effective information from the digital text due its high dimensionality, sparseness and big data. In this paper, we study the powerful nonparametric Bayesian topic model which is Hierarchical Latent Dirichlet Allocation (hLDA) We deal the issue of learning topics hierarchies from Urdu text data. The presented Topic Model for Urdu is combined with preprocessing activities, hLDA model, and Gibbs Sampling (GS) algorithm. We present hLDA base topic model called Urdu Hierarchical Latent Dirichlet Allocation (uhLDA) Empirical study showed that uhLDA effectively learns the topics hierarchies from 5000 Urdu text documents. Furthermore, we evaluated the results using Pointwise Mutual information (PMI) and it shows that uhLDA outperforms as compared to existing standard topic model LDA.
引用
收藏
页码:464 / 469
页数:6
相关论文
共 50 条
  • [1] Statistical Topic Modeling for Urdu Text Articles
    Rehman, Anwar Ur
    Rehman, Zobia
    Akram, Junaid
    Ali, Waqar
    Shah, Munam Ali
    Salman, Muhammad
    [J]. 2018 24TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC' 18), 2018, : 62 - 67
  • [2] Topic Modeling, Sentiment Analysis and Text Summarization for Analyzing News Headlines and Articles
    Thakur, Omswroop
    Saritha, Sri Khetwat
    Jain, Sweta
    [J]. MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT I, 2022, 1762 : 220 - 239
  • [3] On Building an Interpretable Topic Modeling Approach for the Urdu Language
    Nasim, Zarmeen
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 5200 - 5201
  • [4] Hierarchical Summarization of Text Documents Using Topic Modeling and Formal Concept Analysis
    Akhtar, Nadeem
    Javed, Hira
    Ahmad, Tameem
    [J]. DATA MANAGEMENT, ANALYTICS AND INNOVATION, ICDMAI 2018, VOL 2, 2019, 839 : 21 - 33
  • [5] Hierarchical Theme and Topic Modeling
    Chien, Jen-Tzung
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (03) : 565 - 578
  • [6] An Attention Hierarchical Topic Modeling
    Yongheng Chunyan Yin
    Wanli Chen
    [J]. Pattern Recognition and Image Analysis, 2021, 31 : 722 - 729
  • [7] An overview of Hierarchical topic modeling
    Liu, Lin
    Tang, Lin
    He, Libo
    Zhou, Wei
    Yao, Shaowen
    [J]. 2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 1, 2016, : 391 - 394
  • [8] An Attention Hierarchical Topic Modeling
    Yin, Chunyan
    Chen, Yongheng
    Zuo, Wanli
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2021, 31 (04) : 722 - 729
  • [9] Advanced Hierarchical Topic Labeling for Short Text
    Tiwari, Paras
    Tripathi, Ashutosh
    Singh, Avaneesh
    Rai, Sawan
    [J]. IEEE ACCESS, 2023, 11 : 35158 - 35174