Rule Based Part of Speech Tagging of Sindhi Language

被引:9
|
作者
Mahar, Javed Ahmed [1 ]
Memon, Ghulam Qadir [2 ]
机构
[1] Shah Abdul Latif Univ, Dept Comp Sci, Khairpur, Sindh, Pakistan
[2] Hamdard Univ, FEST, HIIT, Karachi, Pakistan
关键词
Sindhi; Part of Speech; Morphology; Lexicon; Tagging Rules;
D O I
10.1109/ICSAP.2010.27
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Part of Speech (POS) tagging is a process of assigning correct syntactic categories to each word in the text. Tag set and word disambiguation rules are fundamental parts of any POS tagger. No work has hitherto been published of tag set in Sindhi language. The Sindhi lexicon for computational processing is also not available. In this study, the tagset for Sindhi POS, lexicon and word disambiguation rules are designed and developed. The Sindhi corpus is collected from a comprehensive Sindhi Dictionary. The corpus is based on the most recent available vocabulary used by local people. In this paper, preliminary achievements of rule based Sindhi Part of Speech (SPOS) tagger are presented. Tagging and tokenization algorithms are also designed for the implementation of SPOS. The outputs of SPOS are verified by Sindhi linguist. The development of SPOS tagger may have an important milestone towards computational Sindhi language processing.
引用
收藏
页码:101 / 106
页数:6
相关论文
共 50 条
  • [21] Part of Speech Tagging for Kayah Language Using Hidden Markov Model
    Linn, Zar Zar
    Patil, Pushpa B.
    [J]. 2019 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2019, : 228 - 233
  • [22] Phrase-based part-of-speech tagging
    Finch, Andrew
    Sumita, Eiichiro
    [J]. PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING (NLP-KE'07), 2007, : 215 - +
  • [23] Arabic Part of Speech Tagging
    Mohamed, Emad
    Kuebler, Sandra
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2537 - 2543
  • [24] Part of speech tagging for Arabic
    Kuebler, Sandra
    Mohamed, Emad
    [J]. NATURAL LANGUAGE ENGINEERING, 2012, 18 : 521 - 548
  • [25] Part-of-speech tagging
    Martinez, Angel R.
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (01): : 107 - 113
  • [26] PART OF SPEECH TAGGING FOR POLISH
    Krasnowska-Kieras, Katarzyna
    Kobylinski, Lukasz
    [J]. POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2019, 55 (02) : 211 - 237
  • [27] Part of Speech Tagging Using Part of Speech Sequence Graph
    Gholami-Dastgerdi P.
    Feizi-Derakhshi M.-R.
    [J]. Annals of Data Science, 2023, 10 (05) : 1301 - 1328
  • [29] A TENGRAM method based part-of-speech tagging of multi-category words in Hindi language
    Gupta, J. P.
    Tayal, Devendra K.
    Gupta, Arti
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (12) : 15084 - 15093
  • [30] Natural Language Requirements Specification Analysis Using Part-of-Speech Tagging
    Fatwanto, Agung
    [J]. 2013 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION TECHNOLOGY (FGCT 2013), 2013, : 98 - 102