Deep Learning based Tamil Parts of Speech (POS) Tagger

被引:0
|
作者
Anbukkarasi, S. [1 ]
Varadhaganapathy, S. [2 ]
机构
[1] Kongu Engn Coll, Dept Comp Sci & Engn, Perundurai, Tamil Nadu, India
[2] Kongu Engn Coll, Dept Informat Technol, Perundurai, Tamil Nadu, India
关键词
POS tagging; deep learning model; natural language processing; Bi-LSTM;
D O I
10.24425/bpasts.2021.138820
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
This paper addresses the problem of part of speech (POS) tagging for the Tamil language, which is low resourced and agglutinative. POS tagging is the process of assigning syntactic categories for the words in a sentence. This is the preliminary step for many of the Natural Language Processing (NLP) tasks. For this work, various sequential deep learning models such as recurrent neural network (RNN), Long Short -Term Memory (LSTM), Gated Recurrent Unit (GRU) and Bi-directional Long Short-Term Memory (Bi-LSTM) were used at the word level. For evaluating the model, the performance metrics such as precision, recall, F1-score and accuracy were used. Further, a tag set of 32 tags and 225 000 tagged Tamil words was utilized for training. To find the appropriate hidden state, the hidden states were varied as 4, 16, 32 and 64, and the models were trained. The experiments indicated that the increase in hidden state improves the performance of the model. Among all the combinations, Bi-LSTM with 64 hidden states displayed the best accuracy (94%). For Tamil POS tagging, this is the initial attempt to be carried out using a deep learning model.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Deep Learning Based Parts of Speech Tagger for Bengali
    Kabir, Md. Fasihul
    Abdullah-Al-Mamun, Khandaker
    Hudat, Mohammad Nurul
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 26 - 29
  • [2] Tamil Part-of-Speech tagger based on SVMTool
    Dhanalakshmi, V
    Anandkumar, M.
    Vijaya, M. S.
    Loganathan, R.
    Soman, K. P.
    Rajendran, S.
    [J]. RECENT ADVANCES OF ASIAN LANGUAGE PROCESSING TECHNOLOGIES, 2008, : 59 - +
  • [3] Implementation of Automated Bengali Parts of Speech Tagger: An Approach Using Deep Learning Algorithm
    Patoary, Asraf Hossain
    Bin Kibria, Md Jahid
    Kaium, Abdul
    [J]. 2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 308 - 311
  • [4] Building a Part of Speech tagger for the Tamil Language
    Sarveswaran, Kengatharaiyer
    Dias, Gihan
    [J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 286 - 291
  • [5] POS tagger model for Kannada text with CRF++ and deep learning approaches
    Shree, M. Rajani
    Shambhavi, B. R.
    [J]. JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2020, 23 (02): : 485 - 493
  • [6] Language model based on POS tagger
    Ziolko, Bartosz
    Manandhar, Suresh
    Wilson, Richard C.
    Ziolko, Mariusz
    [J]. SIGMAP 2008: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MULTIMEDIA APPLICATIONS, 2008, : 177 - +
  • [7] Bilingual Corpus-based Hybrid POS Tagger for Low Resource Tamil Language: A Statistical approach
    Selvi, S. Senthamizh
    Anitha, R.
    [J]. Journal of Intelligent and Fuzzy Systems, 2022, 43 (06): : 8329 - 8348
  • [8] A Hybrid Parts Of Speech Tagger for Malayalam Language
    Aziz, Anisha T.
    Sunitha, C.
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1502 - 1507
  • [9] Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
    Dalai, Tusarkanta
    Kumarmishra, Tapas
    Sa, Andpankaj K.
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
  • [10] Bilingual Corpus-based Hybrid POS Tagger for Low Resource Tamil Language: A Statistical approach
    Selvi, S. Senthamizh
    Anitha, R.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (06) : 8329 - 8348