AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach

被引:6
|
作者
Pathak, Dhrubajyoti [1 ]
Nandi, Sukumar [1 ]
Sarmah, Priyankoo [1 ]
机构
[1] Indian Inst Technol Guwahati, Ctr Linguist Sci & Technol, North Guwahati, India
关键词
Assamese POS; DL based POS tagger; part of speech tagger; AsPOS; Assamese text analytics; LANGUAGE;
D O I
10.1109/AICCSA56895.2022.10017934
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Part of Speech (POS) tagging is crucial to Natural Language Processing (NLP). It is a well-studied topic in several resource-rich languages. However, the development of computational linguistic resources is still in its infancy despite the existence of numerous languages that are historically and literary rich. Assamese, an Indian scheduled language, spoken by more than 25 million people, falls under this category. In this paper, we present a Deep Learning (DL)-based POS tagger for Assamese. The development process is divided into two stages. In the first phase, several pre-trained word embeddings are employed to train several tagging models. This allows us to evaluate the performance of the word embeddings in the POS tagging task. The top-performing model from the first phase is employed to annotate another set of new sentences. In the second phase, the model is trained further using the fresh dataset. Finally, we attain a tagging accuracy of 86.52 in F1 score. The model may serve as a baseline for further study on DL-based Assamese POS tagging.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Part-of-speech Tagger for Assamese Using Ensembling Approach
    Pathak, Dhrubajyoti
    Nandi, Sukumar
    Sarmah, Priyankoo
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [2] Implementation of Automated Bengali Parts of Speech Tagger: An Approach Using Deep Learning Algorithm
    Patoary, Asraf Hossain
    Bin Kibria, Md Jahid
    Kaium, Abdul
    [J]. 2020 IEEE REGION 10 SYMPOSIUM (TENSYMP) - TECHNOLOGY FOR IMPACTFUL SUSTAINABLE DEVELOPMENT, 2020, : 308 - 311
  • [3] Development of Marathi Part of Speech Tagger Using Statistical Approach
    Singh, Jyoti
    Joshi, Nisheeth
    Mathur, Iti
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1554 - 1559
  • [4] Learning a Stochastic Part of Speech Tagger for Sinhala
    Jayasuriya, M.
    Weerasinghe, A. R.
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER), 2013, : 137 - 143
  • [5] Part-of-Speech Tagger for Biomedical Domain Using Deep Neural Network Architecture
    Gopalakrishnan, Athira
    Soman, K. P.
    Premjith, B.
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [6] Deep Learning Based Parts of Speech Tagger for Bengali
    Kabir, Md. Fasihul
    Abdullah-Al-Mamun, Khandaker
    Hudat, Mohammad Nurul
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 26 - 29
  • [7] Deep Learning based Tamil Parts of Speech (POS) Tagger
    Anbukkarasi, S.
    Varadhaganapathy, S.
    [J]. BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2021, 69 (06)
  • [8] On Part of Speech Tagger for Indonesian Language
    Yuwana, R. Sandra
    Yuliani, Asri R.
    Pardede, Hilman F.
    [J]. 2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 369 - 372
  • [9] Hybrid Part of Speech Tagger for Malayalam
    Francis, Merin
    Nair, K. N. Ramachandran
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 1744 - 1750
  • [10] Deep Belief Network Based Part-of-Speech Tagger for Telugu Language
    Jagadeesh, M.
    Kumar, M. Anand
    Soman, K. P.
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 75 - 84