Comparison of different POS tagging techniques (n-gram, HMM and Brill's tagger) for Bangla

被引:21
|
作者
Hasan, Fahim Muhammad
UzZaman, Naushad
Khan, Murnit
机构
关键词
POS tagging; POS tagger; Bangla; Bengali; n-gram; HMM; Brill's transformation based tagger;
D O I
10.1007/978-1-4020-6264-3_23
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill's tagger). A supervised POS tagging approach requires a large amount of annotated training corpus to tag properly. At this initial stage of POS-tagging for Bangla, we have very limited resource of annotated corpus. We tried to see which technique maximizes the performance with this limited resource. We also checked the performance for English and tried to conclude how these techniques might perform if we can manage a substantial amount of annotated corpus.
引用
收藏
页码:121 / 126
页数:6
相关论文
共 2 条
  • [1] Effectiveness Analysis of Different POS Tagging Techniques for Bangla Language
    Mia, Md. Jueal
    Hassan, Mehedee
    Biswas, Al Amin
    [J]. SMART SYSTEMS: INNOVATIONS IN COMPUTING (SSIC 2021), 2022, 235 : 121 - 134
  • [2] Comparison of Semantic Similarity for Different Languages Using the Google n-gram Corpus and Second-Order Co-occurrence Measures
    Joubarne, Colette
    Inkpen, Diana
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 216 - 221