Hierarchical Topic Modeling for Urdu Text Articles

被引:2
|
作者
Rehman, Anwar Ur [1 ]
Khan, Ali Haider [2 ]
Aftab, Mustansar [3 ]
Rehman, Zobia [1 ]
Shah, Munam Ali [1 ]
机构
[1] Comsats Univ Islamabad, Dept Comp Sci, Islamabad, Pakistan
[2] Univ Management & Technol, Dept Comp Sci, Lahore, Pakistan
[3] Natl Coll Business Adm & Econ, Lahore, Pakistan
关键词
Hierarchal Topic model; Hierarchal LDA; Urdu Topic Model; Urdu Hierarchal LDA; Natural Language Processing; Gibbs sampling;
D O I
10.23919/iconac.2019.8895047
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Digital text is increasing rapidly on the Internet with the excessive use of social media. For this reason, it is very challenging to extract effective information from the digital text due its high dimensionality, sparseness and big data. In this paper, we study the powerful nonparametric Bayesian topic model which is Hierarchical Latent Dirichlet Allocation (hLDA) We deal the issue of learning topics hierarchies from Urdu text data. The presented Topic Model for Urdu is combined with preprocessing activities, hLDA model, and Gibbs Sampling (GS) algorithm. We present hLDA base topic model called Urdu Hierarchical Latent Dirichlet Allocation (uhLDA) Empirical study showed that uhLDA effectively learns the topics hierarchies from 5000 Urdu text documents. Furthermore, we evaluated the results using Pointwise Mutual information (PMI) and it shows that uhLDA outperforms as compared to existing standard topic model LDA.
引用
下载
收藏
页码:464 / 469
页数:6
相关论文
共 50 条
  • [41] Incorporating structural topic modeling into short text analysis
    Wang, Po-Ya Angela
    Hsieh, Shu-Kai
    CONCENTRIC-STUDIES IN LINGUISTICS, 2023, 49 (01) : 96 - 138
  • [42] Filtering out the noise in short text topic modeling
    Li, Ximing
    Wang, Yue
    Zhang, Ang
    Li, Changchun
    Chi, Jinjin
    Ouyang, Jihong
    INFORMATION SCIENCES, 2018, 456 : 83 - 96
  • [43] Incorporating word embeddings into topic modeling of short text
    Gao, Wang
    Peng, Min
    Wang, Hua
    Zhang, Yanchun
    Xie, Qianqian
    Tian, Gang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (02) : 1123 - 1145
  • [44] Topic modeling for large-scale text data
    Li, Xi-ming
    Ouyang, Ji-hong
    Lu, You
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2015, 16 (06) : 457 - 465
  • [45] Feature selection for text data via topic modeling
    Jang, Woosol
    Kim, Ye Eun
    Son, Won
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (06) : 739 - 754
  • [46] Topic modeling for large-scale text data
    Xi-ming Li
    Ji-hong Ouyang
    You Lu
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 457 - 465
  • [47] Topic Modeling for Interpretable Text Classification From EHRs
    Rijcken, Emil
    Kaymak, Uzay
    Scheepers, Floortje
    Mosteiro, Pablo
    Zervanou, Kalliopi
    Spruit, Marco
    FRONTIERS IN BIG DATA, 2022, 5
  • [48] Comparative Text Analytics via Topic Modeling in Banking
    Chen, Yu
    Rabbani, Rhaad M.
    Gupta, Aparna
    Zaki, Mohammed J.
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 306 - 313
  • [49] Topic Modeling and Text Analysis for Qualitative Policy Research
    Isoaho, Karoliina
    Gritsenko, Daria
    Makela, Eetu
    POLICY STUDIES JOURNAL, 2021, 49 (01) : 300 - 324
  • [50] Topic Modeling for Assessment of Text Complexity in Russian Textbooks
    Sakhovskiy, Andrey
    Solovyev, Valery
    Solnyshkina, Marina
    2020 IVANNIKOV ISPRAS OPEN CONFERENCE (ISPRAS 2020), 2020, : 102 - 108