Statistical Topic Modeling for Urdu Text Articles

被引:0
|
作者
Rehman, Anwar Ur [1 ]
Rehman, Zobia
Akram, Junaid
Ali, Waqar
Shah, Munam Ali
Salman, Muhammad
机构
[1] CUI, Dept Comp Sci, Islamabad, Pakistan
关键词
Urdu; topic modeling; Artificial Intelligence; Urdu language Processing; Natural Language Processing; variational Bayes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Natural Language Processing (NLP) is a branch of Artificial Intelligence to help computers manipulate and interpret human languages. In NLP, text mining is a technique to derive useful information from text. Topic Model (TM) is a statistical model to extract topics from a large collection of unlabeled text using NLP and machine learning techniques. Several effective TM are available to fulfill the needs of various languages like English, German, Arabic etc. However no compelling TM is available for poor resource South Asian language Urdu. In this research study, our focus is to work on existing TM like Latent Dirichlet Allocation (LDA) to overcome the issues of Urdu language in text mining. We studied and analyzed LDA as an unsupervised model for the Urdu topic identification. Hence, we studied LDA deeply for Urdu topic identification at two levels: Variational Bayes (VB) based LDA for Urdu (VB-ULDA) with stemmer and without stemmer. Experiments are performed on a self-created massive number of Urdu documents in four different corpora. Experimental study shows that VB-ULDA outperformed in the identification of topics from Urdu text documents as compared to existing Urdu LDA (ULDA) in terms of accuracy and efficiency and results also reveal the high impact of stemming algorithm in Urdu topic identification.
引用
收藏
页码:62 / 67
页数:6
相关论文
共 50 条
  • [1] Hierarchical Topic Modeling for Urdu Text Articles
    Rehman, Anwar Ur
    Khan, Ali Haider
    Aftab, Mustansar
    Rehman, Zobia
    Shah, Munam Ali
    [J]. 2019 25TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC), 2019, : 464 - 469
  • [2] Topic Modeling, Sentiment Analysis and Text Summarization for Analyzing News Headlines and Articles
    Thakur, Omswroop
    Saritha, Sri Khetwat
    Jain, Sweta
    [J]. MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT I, 2022, 1762 : 220 - 239
  • [3] On Building an Interpretable Topic Modeling Approach for the Urdu Language
    Nasim, Zarmeen
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 5200 - 5201
  • [4] A Survey of Topic Modeling in Text Mining
    Alghamdi, Rubayyi
    Alfalqi, Khalid
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (01) : 147 - 153
  • [5] Collaborative Topic Modeling for Text Tensors
    Ding, Weifeng
    Zheng, Xiaolin
    Chen, Chaochao
    Yu, Zukun
    Chen, Deren
    [J]. 2014 IEEE 11TH INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING (ICEBE), 2014, : 89 - 96
  • [6] Text segmentation: A topic modeling perspective
    Misra, Hemant
    Yvon, Francois
    Cappe, Olivier
    Jose, Joemon
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (04) : 528 - 544
  • [7] A FRAMEWORK OF URDU TOPIC MODELING USING LATENT DIRICHLET ALLOCATION (LDA)
    Shakeel, Khadija
    Tahir, Ghulam Rasool
    Tehseen, Irsha
    Ali, Mubashir
    [J]. 2018 IEEE 8TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2018, : 117 - 123
  • [8] Transformer-Based Topic Modeling for Urdu Translations of the Holy Quran
    Zafar, Amna
    Wasim, Muhammad
    Zulfiqar, Shaista
    Waheed, Talha
    Siddique, AbuBakar
    [J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, 23 (10)
  • [9] Topic-OPA: A Topic Ontology for Modeling Topics of Old Press Articles
    El Ghosh, Mirna
    Zanni-Merk, Cecilia
    Delestre, Nicolas
    Kotowicz, Jean-Philippe
    Abdulrab, Habib
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 275 - 282
  • [10] Latent Topic Text Representation Learning on Statistical Manifolds
    Jiang, Bingbing
    Li, Zhengyu
    Chen, Huanhuan
    Cohn, Anthony G.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (11) : 5643 - 5654