Topic Labeling Towards News Document Collection Based on Latent Dirichlet Allocation and Ontology

被引:0
|
作者
Adhitama, Rifki [1 ]
Kusumaningrum, Retno [2 ]
Gernowo, Rahmat [3 ]
机构
[1] Univ Diponegoro, Informat Syst, Semarang, Indonesia
[2] Univ Diponegoro, Dept Informat, Semarang, Indonesia
[3] Univ Diponegoro, Dept Phys, Semarang, Indonesia
关键词
text clustering; cluster labeling; latent dirichlet allocation; ontology;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Latent Dirichlet Allocation (LDA) is a topic modeling method that provides the flexibility to organize, understand, search, and summarize electronic archives that have proven well implemented in text and information retrieval. The weakness of the LDA method is the inability to label the topics that have been formed. This research combines LDA with ontology scheme to overcome the weakness of labeling topic on LDA. This study uses datasets of 50 news documents taken from the online news portal. The ontology scheme used in this study is based on the dictionary of the field contained in "Kamus Besar Bahasa Indonesia (KBBI)". The experiment aims to find the best word count representation for each topic in order to produce the relevant label name for the topic. Cohen's kappa coefficient is used to measure the reliability of the label based on the agreement of two linguistic experts, while the mean relevance rate is used to measure the average of the relevant value of linguistic experts on a label with particular words representation that has more than 41% of the kappa value. The results of this study indicate the highest kappa value is in the five words representation of each topic with 100% value, while the highest mean relevance rate is in the 5 words and 30 words representation of each topic with 80% value. The average of kappa value is 61%, and the average value of mean Relevance rate is 71%.
引用
收藏
页码:247 / 251
页数:5
相关论文
共 50 条
  • [31] Local-class-shared-topic latent Dirichlet allocation based scene classification
    Huang, Chao
    Luo, Wang
    Xie, Yurui
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (14) : 15661 - 15679
  • [32] DUET: Data-Driven Approach Based on Latent Dirichlet Allocation Topic Modeling
    Wang, Yan
    Taylor, John E.
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2019, 33 (03)
  • [33] A More Effective Method For Image Representation: Topic Model Based on Latent Dirichlet Allocation
    Li, Zongmin
    Tian, Weiwei
    Li, Yante
    Kuang, Zhenzhong
    Liu, Yujie
    2015 14TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS (CAD/GRAPHICS), 2015, : 143 - 148
  • [34] Indonesia's News Topic Discussion about Covid-19 Outbreak using Latent Dirichlet Allocation
    Faculty of Mathematics and Natural Science, Universitas Syiah Kuala, Banda Aceh, Indonesia
    不详
    Int. Conf. Informatics Comput., ICIC, 2020,
  • [35] Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation
    Silvia García-Méndez
    Francisco de Arriba-Pérez
    Ana Barros-Vila
    Francisco J. González-Castaño
    Enrique Costa-Montenegro
    Applied Intelligence, 2023, 53 : 19610 - 19628
  • [36] Automatic detection of relevant information, predictions and forecasts in financial news through topic modelling with Latent Dirichlet Allocation
    Garcia-Mendez, Silvia
    de Arriba-Perez, Francisco
    Barros-Vila, Ana
    Gonzalez-Castano, Francisco J. J.
    Costa-Montenegro, Enrique
    APPLIED INTELLIGENCE, 2023, 53 (16) : 19610 - 19628
  • [37] Feature extraction for document text using Latent Dirichlet Allocation
    Prihatini, P. M.
    Suryawan, I. K.
    Mandia, I. N.
    2ND INTERNATIONAL JOINT CONFERENCE ON SCIENCE AND TECHNOLOGY (IJCST) 2017, 2018, 953
  • [38] Obtaining Single Document Summaries Using Latent Dirichlet Allocation
    Nagesh, Karthik
    Murty, M. Narasimha
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 66 - 74
  • [39] Topic Modeling Twitter Data Using Latent Dirichlet Allocation and Latent Semantic Analysis
    Qomariyah, Siti
    Iriawan, Nur
    Fithriasari, Kartika
    2ND INTERNATIONAL CONFERENCE ON SCIENCE, MATHEMATICS, ENVIRONMENT, AND EDUCATION, 2019, 2019, 2194
  • [40] A Document Clustering Algorithm Based on Semi-constrained Hierarchical Latent Dirichlet Allocation
    Xu, Jungang
    Zhou, Shilong
    Qiu, Lin
    Liu, Shengyuan
    Li, Pengfei
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2014, 2014, 8793 : 49 - 60