A hybrid statistical and deep learning based technique for Persian part of speech tagging

被引:0
|
作者
Sara Besharati
Hadi Veisi
Ali Darzi
Seyed Habib Hosseini Saravani
机构
[1] University of Tehran,Faculty of New Sciences and Technologies
[2] University of Tehran,Department of General Linguistics
[3] Sharif University of Technology,Computational Linguistics Group
关键词
Persian POS tagging; Word vectors; Multi-layer perceptron (MLP); Long short term memory (LSTM);
D O I
10.1007/s42044-020-00063-1
中图分类号
学科分类号
摘要
In part of speech (POS) tagging, the main challenge is to predict the right tags for both in-vocabulary (IV) and out-of-vocabulary (OOV) words. Therefore, artificial neural networks, such as multi-layer perceptron (MLP) and long short term memory (LSTM), which seem to be efficient because of their high generality capability, have been applied to POS tagging to overcome this challenge. In this research, using word vectors as the input of MLP and LSTM neural networks, we do POS tagging in Persian language and compare the results of the neural models with a second-order hidden Markov model (HMM) which in fact is our benchmark. To investigate the effect of the number of hidden layers, we use both a single-layer and a two-layer MLP and LSTM neural network. Also, we have applied a bidirectional LSTM neural network to investigate the effect of a bidirectional learning algorithm on Persian POS tagging. The results obtained from different models in this research show that neural models have a far better performance in predicting the correct POS tags for OOV words, which can be due to their higher generality. Therefore, we have proposed a hybrid model which is a combination of the HMM and a single-layer bidirectional LSTM model as an innovative model in POS tagging. This hybrid model is successful in improving both HMM and neural models, increasing the accuracy to 97.29%.
引用
收藏
页码:35 / 43
页数:8
相关论文
共 50 条
  • [1] Evaluation of statistical part of speech tagging of Persian text
    Tasharofi, Samira
    Raja, Fahimeh
    Oroumchian, Farhad
    Rahgozar, Masoud
    [J]. 2007 9TH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1-3, 2007, : 152 - 155
  • [2] Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches
    Dalai, Tusarkanta
    Mishra, Tapas Kumar
    Sa, Pankaj K.
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [3] Part-of-speech Tagging Based on Dictionary and Statistical Machine Learning
    Ye Zhonglin
    Jia Zhen
    Huang Junfu
    Yin Hongfeng
    [J]. PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 6993 - 6998
  • [4] Deep Learning Model for Tamil Part-of-Speech Tagging
    Visuwalingam, Hemakasiny
    Sakuntharaj, Ratnasingam
    Alawatugoda, Janaka
    Ragel, Roshan
    [J]. COMPUTER JOURNAL, 2024, 67 (08): : 2633 - 2642
  • [5] FarsiTag: A part-of-speech tagging system for Persian
    Rezai, Mohammad Javad
    Miangah, Tayebeh Mosavi
    [J]. DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2017, 32 (03) : 632 - 642
  • [6] A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language
    Ullah, Shaheen
    Ahmad, Riaz
    Namoun, Abdallah
    Muhammad, Siraj
    Ullah, Khalil
    Hussain, Ibrar
    Ibrahim, Isa Ali
    [J]. IEEE ACCESS, 2024, 12 : 86355 - 86364
  • [7] Part of speech tagging: a systematic review of deep learning and machine learning approaches
    Chiche, Alebachew
    Yitagesu, Betselot
    [J]. JOURNAL OF BIG DATA, 2022, 9 (01)
  • [8] Part of speech tagging: a systematic review of deep learning and machine learning approaches
    Alebachew Chiche
    Betselot Yitagesu
    [J]. Journal of Big Data, 9
  • [9] A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language
    Prabha, Greeshma
    Jyothsna, P., V
    Shahina, K. K.
    Premjith, B.
    Soman, K. P.
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1132 - 1136
  • [10] Part of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches
    Khan, Wahab
    Daud, Ali
    Khan, Khairullah
    Nasir, Jamal Abdul
    Basheri, Mohammed
    Aljohani, Naif
    Alotaibi, Fahd Saleh
    [J]. IEEE ACCESS, 2019, 7 : 38918 - 38936