Extraction of Proper Names from Myanmar Text Using Latent Dirichlet Allocation

被引:0
|
作者
Win, Yuzana [1 ]
Masada, Tomonari [1 ]
机构
[1] Nagasaki Univ, Grad Sch Engn, Nagasaki, Japan
来源
2016 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI) | 2016年
关键词
LDA; LSI; rule-based; K-means clustering;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a method for proper names extraction from Myanmar text by using latent Dirichlet allocation (LDA). Our method aims to extract proper names that provide important information on the contents of Myanmar text. Our method consists of two steps. In the first step, we extract topic words from Myanmar news articles by using LDA. In the second step, we make a post-processing, because the resulting topic words contain some noisy words. Our post-processing, first of all, eliminates the topic words whose prefixes are Myanmar digits and suffixes are noun and verb particles. We then remove the duplicate words and discard the topic words that are contained in the existing dictionary. Consequently, we obtain the words as candidate of proper names, namely personal names, geographical names, unique object names, organization names, single event names, and so on. The evaluation is performed both from the subjective and quantitative perspectives. From the subjective perspective, we compare the accuracy of proper names extracted by our method with those extracted by latent semantic indexing (LSI) and rule-based method. It is shown that both LSI and our method can improve the accuracy of those obtained by rule-based method. However, our method can provide more interesting proper names than LSI. From the quantitative perspective, we use the extracted proper names as additional features in K-means clustering. The experimental results show that the document clusters given by our method are better than those given by LSI and rule-based method in precision, recall and F-score.
引用
收藏
页码:96 / 103
页数:8
相关论文
共 50 条
  • [21] Bug localization using latent Dirichlet allocation
    Lukins, Stacy K.
    Kraft, Nicholas A.
    Etzkorn, Letha H.
    INFORMATION AND SOFTWARE TECHNOLOGY, 2010, 52 (09) : 972 - 990
  • [22] Author Identification Using Latent Dirichlet Allocation
    Calvo, Hiram
    Hernandez-Castaneda, Angel
    Garcia-Flores, Jorge
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 303 - 312
  • [23] An Improved Latent Dirichlet Allocation Model for Hot Topic Extraction
    Liu, Guolong
    Xu, Xiaofei
    Zhu, Ying
    Li, Li
    2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 470 - 476
  • [24] Rail transit fault text classification based on the latent dirichlet allocation
    Li, R.
    Su, S.
    Wang, G.
    Qu, J.
    Cao, Y.
    2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 1359 - 1364
  • [25] Text mining of Reddit posts: Using latent Dirichlet allocation to identify common parenting issues
    Westrupp, Elizabeth M.
    Greenwood, Christopher J.
    Fuller-Tyszkiewicz, Matthew
    Berkowitz, Tomer S.
    Hagg, Lauryn
    Youssef, George
    PLOS ONE, 2022, 17 (02):
  • [26] A text classification model constructed by Latent Dirichlet Allocation and Deep Learning
    Liu, Yu
    Jin, Zhengping
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2501 - 2504
  • [27] Full-Text or Abstract? Examining Topic Coherence Scores Using Latent Dirichlet Allocation
    Syed, Shaheen
    Spruit, Marco
    2017 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2017, : 165 - 174
  • [28] Financial Latent Dirichlet Allocation (FinLDA): Feature Extraction in Text and Data Mining for Financial Time Series Prediction
    Kanungsukkasem, Nont
    Leelanupab, Teerapong
    IEEE ACCESS, 2019, 7 : 71645 - 71664
  • [29] Technology analysis from patent data using latent dirichlet allocation
    Kim, Gabjo
    Park, Sangsung
    Jang, Dongsik
    Advances in Intelligent Systems and Computing, 2014, 271 : 71 - 80
  • [30] A PERCEPTUAL HASHING ALGORITHM USING LATENT DIRICHLET ALLOCATION
    Vretos, Nicholas
    Nikolaidis, Nikos
    Pitas, Ioannis
    ICME: 2009 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-3, 2009, : 362 - 365