Morphology Analysis for Hidden Markov Model based Indonesian Part-of-Speech Tagger

被引:0
|
作者
Muljono [1 ]
Afini, Umriya [1 ]
Supriyanto, Catur [1 ]
机构
[1] Dian Nuswantoro Univ, Dept Informat Engn, Semarang, Indonesia
关键词
POS tagging; hidden markov model; morphological analysis; clitics; out-of-vocabulary;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Part-of-Speech (POS) tagging plays an important role in Natural Language Processing (NLP). It classifies a word into its tags, such as noun, verb, and pronoun. Many POS tagging approaches have been developed to solve manual POS tagging which is a time-consuming task. Hidden Markov Model (HMM) is a statistical-based method which widely used for POS tagging. In Indonesian language, HMM has been improved with affix tree method which handles Out-of-Vocabulary (OOV) words problem and affixation. The problem is affix tree does not provide any information to handle the clitics. Therefore, this study proposes morphology analysis for Indonesian Part-of-Speech (POS) Tagging. We combine MorphInd as morphology analyzer and HMM to improve the performance of POS tagging. In the experiment, there are 10,000 tokens for training and 3,000 tokens for testing. We prepare three different testing corpus; each consists of 10%, 20%, and 30% OOV words. The experimental results show that the proposed method achieves better performance compared to other methods.
引用
收藏
页码:237 / 240
页数:4
相关论文
共 50 条
  • [1] A FARSI PART-OF-SPEECH TAGGER BASED on MARKOV MODEL
    Mohseni, Mahdi
    Motalebi, Hasan
    Minaei-bidgoli, Behrouz
    Shokrollahi-far, Mahmoud
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1588 - +
  • [2] Part-of-speech tagger for Ainu language based on higher order Hidden Markov Model
    Ptaszynski, Michal
    Momouchi, Yoshio
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (14) : 11576 - 11582
  • [3] Hidden Markov model based part of speech tagger for Urdu
    School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen Graduate School, China
    [J]. Inf. Technol. J., 2007, 8 (1190-1198):
  • [4] Building an Indonesian Rule-Based Part-of-Speech Tagger
    Rashel, Fam
    Luthfi, Andry
    Dinakaramani, Arawinda
    Manurung, Ruli
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 70 - 73
  • [5] Part-of-Speech Tagger Based on Maximum Entropy Model
    Huang Heyan
    Zhang Xiaofei
    [J]. 2009 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 3, 2009, : 26 - 29
  • [6] A Hidden Markov Model for Persian Part-of-Speech Tagging
    Okhovvat, Morteza
    Bidgoli, Behrouz Minaei
    [J]. WORLD CONFERENCE ON INFORMATION TECHNOLOGY (WCIT-2010), 2011, 3
  • [7] A Persian Part-Of-Speech Tagger Based on Morphological Analysis
    Mohseni, Mahdi
    Minaei-bidgoli, Behrouz
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1253 - 1257
  • [8] A part-of-speech tagging method based on improved hidden Markov model
    [J]. Yuan, L.-C. (yuanlichi@sohu.com), 1600, Central South University of Technology (43):
  • [9] A Hybrid of Rule-based and HMM-based Part-of-Speech Tagger for Indonesian
    Ananda, Muhammad Ridho
    Hanifmuti, Muhammad Yudistira
    Alfina, Ika
    [J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 280 - 285
  • [10] A morphology-system and part-of-speech tagger for German
    Lezius, W
    Rapp, R
    Wettler, M
    [J]. NATURAL LANGUAGE PROCESSING AND SPEECH TECHNOLOGY: RESULTS OF THE 3RD KONVENS CONFERENCE, 1996, : 369 - 378