A FRAMEWORK OF URDU TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)

被引:0
|
作者
Shakeel, Khadija [1 ]
Tahir, Ghulam Rasool [2 ]
Tehseen, Irsha [1 ]
Ali, Mubashir [1 ]
机构
[1] Univ Lahore, Dept CS & IT, Gujrat, Pakistan
[2] Allama Iqbal Open Univ, Dept CS & IT, Islamabad, Pakistan
关键词
Text Mining; Topic Model; Latent Dirichlet Allocation; Morphology; Gibbs Sampling;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this age, text mining research community has given an immense attention towards the development of text mining tools, techniques and models. Topic modeling is an area of Text Mining which is being used in various areas e. g. summarization, searching, semantics, and many other. Topic Modeling is used to uncover the hidden topics from large collection of documents or text. It is also equally important for many other interesting research areas like Natural Language Processing (NLP), Machine Learning (ML), statistics etc. In order to fulfill the goal of topic model, a lot of models have been proposed, in literature, for variety of languages such as English and Arabic etc. All of the models differ in their various nature, theories, and implementation strategies as all languages has their own morphological structure, semantics and syntax. The motivation behind this work is that there is no such work is available for Urdu language to extract topics from documents. Although some standard Topic Models has been proposed such as Latent Dirichlet Allocation (LDA), there is still a need of development of comprehensive model to cater Topic Model specific for Urdu text. In this research, we have proposed an effective topic model for Urdu language to cope with the challenges of Urdu morphological structure. The proposed Topic Model for Urdu is a framework that combine pre-processing techniques, LDA model and Gibbs sampling. This proposed Topic Model for Urdu used the standard LDA model therefore we named it Urdu Latent Dirichlet Allocation (ULDA). Experiments are conducted to show the efficacy of our proposed approach as compared to the competitors. The experimental results show the dominance of our proposed ULDA model as compared to existing systems. The work is being carried out for the first time in Urdu language.
引用
收藏
页码:117 / 123
页数:7
相关论文
共 50 条
  • [1] Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints
    Bastani, Kaveh
    Namavari, Hamed
    Shaffer, Jeffrey
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 127 : 256 - 271
  • [2] Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
    Jelodar, Hamed
    Wang, Yongli
    Yuan, Chi
    Feng, Xia
    Jiang, Xiahui
    Li, Yanchao
    Zhao, Liang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (11) : 15169 - 15211
  • [3] Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey
    Hamed Jelodar
    Yongli Wang
    Chi Yuan
    Xia Feng
    Xiahui Jiang
    Yanchao Li
    Liang Zhao
    [J]. Multimedia Tools and Applications, 2019, 78 : 15169 - 15211
  • [4] Topic Modeling Using Latent Dirichlet allocation: A Survey
    Chauhan, Uttam
    Shah, Apurva
    [J]. ACM COMPUTING SURVEYS, 2021, 54 (07)
  • [5] Topic Modeling of the Pakistani Economy in English Newspapers via Latent Dirichlet Allocation (LDA)
    Ahmed, Fasih
    Nawaz, Muhammad
    Jadoon, Aisha
    [J]. SAGE OPEN, 2022, 12 (01):
  • [6] Topic modeling for expert finding using latent Dirichlet allocation
    Momtazi, Saeedeh
    Naumann, Felix
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 346 - 353
  • [7] Latent Dirichlet Allocation (LDA) for improving the topic modeling of the official bulletin of the spanish state (BOE)
    Bailon-Elvira, J. C.
    Cobo, M. J.
    Herrera-Viedma, E.
    Lopez-Herrera, A. G.
    [J]. 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2019): INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT BASED ON ARTIFICIAL INTELLIGENCE, 2019, 162 : 207 - 214
  • [8] Road Traffic Topic Modeling on Twitter using Latent Dirichlet Allocation
    Hidayatullah, Ahmad Fathan
    Ma'arif, Muhammad Rifqi
    [J]. 2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 47 - 52
  • [9] ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation
    Schwarz, Carlo
    [J]. STATA JOURNAL, 2018, 18 (01): : 101 - 117
  • [10] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    [J]. 2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194