Comparison of different POS tagging techniques (n-gram, HMM and Brill's tagger) for Bangla

被引：21

作者：

Hasan, Fahim Muhammad

UzZaman, Naushad

Khan, Murnit

机构：

来源：

ADVANCES AND INNOVATIONS IN SYSTEMS, COMPUTING SCIENCES AND SOFTWARE ENGINEERING | 2007年

关键词：

POS tagging; POS tagger; Bangla; Bengali; n-gram; HMM; Brill's transformation based tagger;

D O I：

10.1007/978-1-4020-6264-3_23

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There are different approaches to the problem of assigning each word of a text with a parts-of-speech tag, which is known as Part-Of-Speech (POS) tagging. In this paper we compare the performance of a few POS tagging techniques for Bangla language, e.g. statistical approach (n-gram, HMM) and transformation based approach (Brill's tagger). A supervised POS tagging approach requires a large amount of annotated training corpus to tag properly. At this initial stage of POS-tagging for Bangla, we have very limited resource of annotated corpus. We tried to see which technique maximizes the performance with this limited resource. We also checked the performance for English and tried to conclude how these techniques might perform if we can manage a substantial amount of annotated corpus.

引用

页码：121 / 126

页数：6