Topic Labeling Towards News Document Collection Based on Latent Dirichlet Allocation and Ontology

被引:0
|
作者
Adhitama, Rifki [1 ]
Kusumaningrum, Retno [2 ]
Gernowo, Rahmat [3 ]
机构
[1] Univ Diponegoro, Informat Syst, Semarang, Indonesia
[2] Univ Diponegoro, Dept Informat, Semarang, Indonesia
[3] Univ Diponegoro, Dept Phys, Semarang, Indonesia
关键词
text clustering; cluster labeling; latent dirichlet allocation; ontology;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) is a topic modeling method that provides the flexibility to organize, understand, search, and summarize electronic archives that have proven well implemented in text and information retrieval. The weakness of the LDA method is the inability to label the topics that have been formed. This research combines LDA with ontology scheme to overcome the weakness of labeling topic on LDA. This study uses datasets of 50 news documents taken from the online news portal. The ontology scheme used in this study is based on the dictionary of the field contained in "Kamus Besar Bahasa Indonesia (KBBI)". The experiment aims to find the best word count representation for each topic in order to produce the relevant label name for the topic. Cohen's kappa coefficient is used to measure the reliability of the label based on the agreement of two linguistic experts, while the mean relevance rate is used to measure the average of the relevant value of linguistic experts on a label with particular words representation that has more than 41% of the kappa value. The results of this study indicate the highest kappa value is in the five words representation of each topic with 100% value, while the highest mean relevance rate is in the 5 words and 30 words representation of each topic with 80% value. The average of kappa value is 61%, and the average value of mean Relevance rate is 71%.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [1] Topic Selection in Latent Dirichlet Allocation
    Wang, Biao
    Liu, Zelong
    Li, Maozhen
    Liu, Yang
    Qi, Man
    2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 756 - 760
  • [2] Crowd labeling latent Dirichlet allocation
    Luca Pion-Tonachini
    Scott Makeig
    Ken Kreutz-Delgado
    Knowledge and Information Systems, 2017, 53 : 749 - 765
  • [3] Crowd labeling latent Dirichlet allocation
    Pion-Tonachini, Luca
    Makeig, Scott
    Kreutz-Delgado, Ken
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 53 (03) : 749 - 765
  • [4] An Ontology Term Extracting Method Based on Latent Dirichlet Allocation
    Yu Jing
    Wang Junli
    Zhao Xiaodong
    2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY (MINES 2012), 2012, : 366 - 369
  • [5] Mining Web Log Data for News Topic Modeling Using Latent Dirichlet Allocation
    Surjandari, Isti
    Rosyidah, Asma
    Zulkarnain
    Laoh, Enrico
    2018 5TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE 2018), 2018, : 331 - 335
  • [6] Classification of Indonesian News Articles based on Latent Dirichlet Allocation
    Kusumaningrum, Retno
    Adhy, Satriyo
    Wiedjayanto, M. Ihsan Aji
    Suryono
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2016,
  • [7] Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation
    Jeon, Hyung-Bae
    Lee, Soo-Young
    ETRI JOURNAL, 2016, 38 (03) : 487 - 493
  • [8] Latent Dirichlet Allocation for Automatic Document Categorization
    Biro, Istvan
    Szabo, Jacint
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 430 - 441
  • [9] Topic Modeling Using Latent Dirichlet allocation: A Survey
    Chauhan, Uttam
    Shah, Apurva
    ACM COMPUTING SURVEYS, 2021, 54 (07)
  • [10] A Hybrid Latent Dirichlet Allocation Approach for Topic Classification
    Hsu, Chi-I
    Chiu, Chaochang
    2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 312 - 315