Topic Labeling Towards News Document Collection Based on Latent Dirichlet Allocation and Ontology

被引:0
|
作者
Adhitama, Rifki [1 ]
Kusumaningrum, Retno [2 ]
Gernowo, Rahmat [3 ]
机构
[1] Univ Diponegoro, Informat Syst, Semarang, Indonesia
[2] Univ Diponegoro, Dept Informat, Semarang, Indonesia
[3] Univ Diponegoro, Dept Phys, Semarang, Indonesia
关键词
text clustering; cluster labeling; latent dirichlet allocation; ontology;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) is a topic modeling method that provides the flexibility to organize, understand, search, and summarize electronic archives that have proven well implemented in text and information retrieval. The weakness of the LDA method is the inability to label the topics that have been formed. This research combines LDA with ontology scheme to overcome the weakness of labeling topic on LDA. This study uses datasets of 50 news documents taken from the online news portal. The ontology scheme used in this study is based on the dictionary of the field contained in "Kamus Besar Bahasa Indonesia (KBBI)". The experiment aims to find the best word count representation for each topic in order to produce the relevant label name for the topic. Cohen's kappa coefficient is used to measure the reliability of the label based on the agreement of two linguistic experts, while the mean relevance rate is used to measure the average of the relevant value of linguistic experts on a label with particular words representation that has more than 41% of the kappa value. The results of this study indicate the highest kappa value is in the five words representation of each topic with 100% value, while the highest mean relevance rate is in the 5 words and 30 words representation of each topic with 80% value. The average of kappa value is 61%, and the average value of mean Relevance rate is 71%.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [21] Topic modeling for expert finding using latent Dirichlet allocation
    Momtazi, Saeedeh
    Naumann, Felix
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 346 - 353
  • [22] Approaches to improve preprocessing for Latent Dirichlet Allocation topic modeling
    Zimmermann, Jamie
    Champagne, Lance E.
    Dickens, John M.
    Hazen, Benjamin T.
    DECISION SUPPORT SYSTEMS, 2024, 185
  • [23] Topic modeling with latent Dirichlet allocation for cancer disease posts
    Altintas, Volkan
    Albayrak, Mehmet
    Topal, Kamil
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2021, 36 (04): : 2183 - 2196
  • [24] An Improved Latent Dirichlet Allocation Model for Hot Topic Extraction
    Liu, Guolong
    Xu, Xiaofei
    Zhu, Ying
    Li, Li
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 470 - 476
  • [25] An Improved Latent Dirichlet Allocation Method for Service Topic Detection
    Guo Lantian
    Li Zhe
    Yang Tao
    Zhang Huixiang
    Mu Dejun
    Li Yang
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7045 - 7049
  • [26] Context-Aware Latent Dirichlet Allocation for Topic Segmentation
    Li, Wenbo
    Matsukawa, Tetsu
    Saigo, Hiroto
    Suzuki, Einoshin
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2020, PT I, 2020, 12084 : 475 - 486
  • [27] Topic Modelling Twitter Data with Latent Dirichlet Allocation Method
    Negara, Edi Surya
    Triadi, Dendi
    Andryani, Ria
    2019 3RD INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (ICECOS 2019), 2019, : 386 - 390
  • [28] Constrained Latent Dirichlet Allocation for Subgroup Discovery with Topic Rules
    Li, Rui
    Ahmadi, Zahra
    Kramer, Stefan
    21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 519 - +
  • [29] Semi-supervised Document Clustering Based on Latent Dirichlet Allocation (LDA)
    秦永彬
    李解
    黄瑞章
    李晶
    JournalofDonghuaUniversity(EnglishEdition), 2016, 33 (05) : 685 - 688
  • [30] Terminological ontology learning and population using latent Dirichlet allocation
    Colace, Francesco
    De Santo, Massimo
    Greco, Luca
    Amato, Flora
    Moscato, Vincenzo
    Picariello, Antonio
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2014, 25 (06): : 818 - 826