Rule Based Part of Speech Tagging of Sindhi Language

被引:9
|
作者
Mahar, Javed Ahmed [1 ]
Memon, Ghulam Qadir [2 ]
机构
[1] Shah Abdul Latif Univ, Dept Comp Sci, Khairpur, Sindh, Pakistan
[2] Hamdard Univ, FEST, HIIT, Karachi, Pakistan
关键词
Sindhi; Part of Speech; Morphology; Lexicon; Tagging Rules;
D O I
10.1109/ICSAP.2010.27
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Part of Speech (POS) tagging is a process of assigning correct syntactic categories to each word in the text. Tag set and word disambiguation rules are fundamental parts of any POS tagger. No work has hitherto been published of tag set in Sindhi language. The Sindhi lexicon for computational processing is also not available. In this study, the tagset for Sindhi POS, lexicon and word disambiguation rules are designed and developed. The Sindhi corpus is collected from a comprehensive Sindhi Dictionary. The corpus is based on the most recent available vocabulary used by local people. In this paper, preliminary achievements of rule based Sindhi Part of Speech (SPOS) tagger are presented. Tagging and tokenization algorithms are also designed for the implementation of SPOS. The outputs of SPOS are verified by Sindhi linguist. The development of SPOS tagger may have an important milestone towards computational Sindhi language processing.
引用
收藏
页码:101 / 106
页数:6
相关论文
共 50 条
  • [1] Parts of Speech Tagging of Romanized Sindhi Text by applying Rule Based Model
    Sodhar, Irum Naz
    Jalbani, Akhtar Hussain
    Channa, Muhammad Ibrahim
    Hakro, Dil Nawaz
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (11): : 91 - 96
  • [2] Hidden Markov Model with Rule Based Approach for Part of Speech Tagging of Myanmar Language
    Zin, Khine Khine
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND INFORMATION TECHNOLOGY, 2009, : 123 - +
  • [3] The computational complexity of rule-based part-of-speech tagging
    Oliva, K
    Kveton, P
    Ondruska, R
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2003, 2807 : 82 - 89
  • [4] Part-of-Speech Tagging for Azerbaijani Language
    Mammadov, Samir
    Rustamov, Samir
    Mustafali, Ali
    Sadigov, Ziyaddin
    Mollayev, Rasim
    Mammadov, Zamir
    [J]. 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 40 - 45
  • [5] Hidden Markov Model Based Part of Speech Tagging for Nepali Language
    Paul, Abhijit
    Purkayastha, Bipul Syam
    Sarkar, Sunita
    [J]. 2015 INTERNATIONAL SYMPOSIUM ON ADVANCED COMPUTING AND COMMUNICATION (ISACC), 2015, : 149 - 156
  • [6] Transformation-based part-of-speech tagging for Serbian language
    Delic, Vlado
    Secujski, Milan
    Kupusinac, Aleksandar
    [J]. PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS AND CYBERNETICS (CIMMACS '09), 2009, : 98 - +
  • [7] Rule Based Approach for Arabic Part of Speech Tagging and Name Entity Recognition
    Btoush, Mohammad Hjouj
    Alarabeyyat, Abdulsalam
    Olab, Isa
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 331 - 335
  • [8] Part-of-Speech Tagging with Rule-Based Data Preprocessing and Transformer
    Li, Hongwei
    Mao, Hongyan
    Wang, Jingzi
    [J]. ELECTRONICS, 2022, 11 (01)
  • [9] Part-of-Speech (POS) Tagging for the Nyishi Language
    Siram, Joyir
    Sambyo, Koj
    Sarkar, Achyuth
    [J]. ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY AND COMPUTING, AICTC 2021, 2022, 392 : 191 - 199
  • [10] Transformation Rule Learning without Rule Templates: A Case Study in Part of Speech Tagging
    Bach, Ngo Xuan
    Cuong, Le Anh
    Ha, Nguyen Viet
    Binh, Nguyen Ngoc
    [J]. ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 9 - +