A web-based Bengali news corpus for named entity recognition

被引:28
|
作者
Ekbal, Asif [1 ]
Bandyopadhyay, Sivaji [1 ]
机构
[1] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata 700032, India
关键词
web as corpus; news corpus; web-based tagged Bengali news corpus; named entity; named entity recognition;
D O I
10.1007/s10579-008-9064-x
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The rapid development of language resources and tools using machine learning techniques for less computerized languages requires appropriately tagged corpus. A tagged Bengali news corpus has been developed from the web archive of a widely read Bengali newspaper. A web crawler retrieves the web pages in Hyper Text Markup Language (HTML) format from the news archive. At present, the corpus contains approximately 34 million wordforms. Named Entity Recognition (NER) systems based on pattern based shallow parsing with or without using linguistic knowledge have been developed using a part of this corpus. The NER system that uses linguistic knowledge has performed better yielding highest F-Score values of 75.40%, 72.30%, 71.37%, and 70.13% for person, location, organization, and miscellaneous names, respectively.
引用
收藏
页码:173 / 182
页数:10
相关论文
共 50 条
  • [1] A web-based Bengali news corpus for named entity recognition
    Asif Ekbal
    Sivaji Bandyopadhyay
    Language Resources and Evaluation, 2008, 42 : 173 - 182
  • [2] A Finnish news corpus for named entity recognition
    Teemu Ruokolainen
    Pekka Kauppinen
    Miikka Silfverberg
    Krister Lindén
    Language Resources and Evaluation, 2020, 54 : 247 - 272
  • [3] A Finnish news corpus for named entity recognition
    Ruokolainen, Teemu
    Kauppinen, Pekka
    Silfverberg, Miikka
    Linden, Krister
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 247 - 272
  • [4] Named Entity Recognition and transliteration in Bengali
    Ekbal, Asif
    Naskar, Sudip Kumar
    Bandyopadhyay, Sivaji
    LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 95 - 114
  • [5] DeepTagger: Knowledge Enhanced Named Entity Recognition for Web-Based Ads Queries
    Zuo, Simiao
    Tang, Pengfei
    Hu, Xinyu
    Lou, Qiang
    Jiao, Jian
    Charles, Denis
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5002 - 5009
  • [6] Three different models for named entity recognition in Bengali
    Ekbal, Asif
    PROGRESS IN PATTERN RECOGNITION, 2007, : 161 - 170
  • [7] Named entity recognition in Bengali using system combination
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    LINGUISTICAE INVESTIGATIONES, 2014, 37 (01): : 1 - 22
  • [8] Bengali Named Entity Recognition using Classifier Combination
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS, 2009, : 259 - 262
  • [9] BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali)
    Sazzed, Salim
    PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 323 - 329
  • [10] A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News
    Jabbari, Ali
    Sauvage, Olivier
    Zeine, Hamada
    Chergui, Hamza
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2293 - 2299