Morphology Analysis for Hidden Markov Model based Indonesian Part-of-Speech Tagger

被引:0
|
作者
Muljono [1 ]
Afini, Umriya [1 ]
Supriyanto, Catur [1 ]
机构
[1] Dian Nuswantoro Univ, Dept Informat Engn, Semarang, Indonesia
关键词
POS tagging; hidden markov model; morphological analysis; clitics; out-of-vocabulary;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Part-of-Speech (POS) tagging plays an important role in Natural Language Processing (NLP). It classifies a word into its tags, such as noun, verb, and pronoun. Many POS tagging approaches have been developed to solve manual POS tagging which is a time-consuming task. Hidden Markov Model (HMM) is a statistical-based method which widely used for POS tagging. In Indonesian language, HMM has been improved with affix tree method which handles Out-of-Vocabulary (OOV) words problem and affixation. The problem is affix tree does not provide any information to handle the clitics. Therefore, this study proposes morphology analysis for Indonesian Part-of-Speech (POS) Tagging. We combine MorphInd as morphology analyzer and HMM to improve the performance of POS tagging. In the experiment, there are 10,000 tokens for training and 3,000 tokens for testing. We prepare three different testing corpus; each consists of 10%, 20%, and 30% OOV words. The experimental results show that the proposed method achieves better performance compared to other methods.
引用
下载
收藏
页码:237 / 240
页数:4
相关论文
共 50 条
  • [21] On Part of Speech Tagger for Indonesian Language
    Yuwana, R. Sandra
    Yuliani, Asri R.
    Pardede, Hilman F.
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 369 - 372
  • [22] Arabic part-of-speech tagger based support vectors machines
    Yousif, Jabar Hassan
    Sembok, Tengku Mohd Tengku
    INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS: COGNITIVE INFORMATICS: BRIDGING NATURAL AND ARTIFICIAL KNOWLEDGE, 2008, : 2084 - +
  • [23] Toward an Effective Igbo Part-of-Speech Tagger
    Onyenwe, Ikechukwu E.
    Hepple, Mark
    Chinedu, Uchechukwu
    Ezeani, Ignatius
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (04)
  • [24] MedPost: a part-of-speech tagger for bioMedical text
    Smith, L
    Rindflesch, T
    Wilbur, WJ
    BIOINFORMATICS, 2004, 20 (14) : 2320 - 2321
  • [25] Lexical tagger based on hidden Markov model
    Razo Hernandez, C.
    Benedi, J. M.
    Sanchez, J. A.
    Guzman Cabrera, R.
    MEP 2006: PROCEEDINGS OF MULTICONFERENCE ON ELECTRONICS AND PHOTONICS, 2006, : 249 - +
  • [26] The Indonesian Language Speech Synthesizer Based on the Hidden Markov Model
    Jangtjik, Kevin Alfianto
    Lestari, Dessi Puji
    2014 International Conference on Electrical Engineering and Computer Science (ICEECS), 2014, : 12 - 16
  • [27] An efficient part-of-speech tagger rule-based approach of Sanskrit language analysis
    Tapaswi N.
    International Journal of Information Technology, 2024, 16 (2) : 901 - 908
  • [28] Twitter Part-Of-Speech Tagging Using Pre-classification Hidden Markov Model
    Sun, Shichang
    Liu, Hongbo
    Lin, Hongfei
    Abraham, Ajith
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 1118 - 1123
  • [29] A Character-Based Part-of-Speech Tagger with Feedforward Neural Networks
    Kolesau, Aliaksei
    Sesok, Dmitrij
    Rybokas, Mindaugas
    ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY, 2018, 21 (04): : 446 - 459
  • [30] Deep Belief Network Based Part-of-Speech Tagger for Telugu Language
    Jagadeesh, M.
    Kumar, M. Anand
    Soman, K. P.
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 75 - 84