Towards a Balanced Named Entity Corpus for Dutch

被引：0

作者：

Desmet, Bart ^{[1
,2
]}

Hoste, Veronique ^{[1
,2
]}

机构：

[1] Univ Coll Ghent, Language & Translat Technol Team, B-9000 Ghent, Belgium

[2] Univ Ghent, Dept Appl Math & Comp Sci, B-9000 Ghent, Belgium

来源：

LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2010年

关键词：

D O I：

暂无

中图分类号：

H [语言、文字];

学科分类号：

05 ;

摘要：

This paper introduces a new named entity corpus for Dutch. State-of-the-art named entity recognition systems require a substantial annotated corpus to be trained on. Such corpora exist for English, but not for Dutch. The STEVIN-funded SoNaR project aims to produce a diverse 500-million-word reference corpus of written Dutch, with four semantic annotation layers: named entities, coreference relations, semantic roles and spatiotemporal expressions. A 1-million-word subset will be manually corrected. Named entity annotation guidelines for Dutch were developed, adapted from the MUC and ACE guidelines. Adaptations include the annotation of products and events, the classification into subtypes, and the markup of metonymic usage. Inter-annotator agreement experiments were conducted to corroborate the reliability of the guidelines, which yielded satisfactory results (Kappa scores above 0.90). We are building a NER system, trained on the 1-million-word subcorpus, to automatically classify the remainder of the SoNaR corpus. To this end, experiments with various classification algorithms (MBL, SVM, CRF) and features have been carried out and evaluated.

引用

页数：7

共 50 条

[41] Named Entity Linking in English-Czech Parallel Corpus
Neverilova, Zuzana
Zizkova, Hana
TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT I, 2024, 15048 : 147 - 158
[42] Corpus Construction for Named-Entity and Entity Relations for Electronic Medical Records of Cardiovascular Disease
Chang, Hongyang
Zan, Hongying
Zhang, Shuai
Zhao, Bingfei
Zhang, Kunli
HEALTH INFORMATION PROCESSING, CHIP 2022, 2023, 1772 : 3 - 18
[43] Emerging Named Entity Recognition on Retrieval Features in an Affective Computing Corpus
Nawroth, Christian
Engel, Felix
Mc Kevitt, Paul
Hemmje, Matthias L.
2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2860 - 2868
[44] BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali)
Sazzed, Salim
PROCEEDINGS OF THE 21ST WORKSHOP ON BIOMEDICAL LANGUAGE PROCESSING (BIONLP 2022), 2022, : 323 - 329
[45] Building a Named Entity Annotated Bilingual English-Vietnamese Corpus
Tuan-An Dao
Hung-Thinh Truong
Long Nguyen
Dien Dinh
PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2018, : 61 - 66
[46] Towards Improving Neural Named Entity Recognition with Gazetteers
Liu, Tianyu
Yao, Jin-Ge
Lin, Chin-Yew
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5301 - 5307
[47] LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain
Pais, Vasile
Mitrofan, Maria
Gasan, Carol Luca
Ianov, Alexandru
Ghita, Corvin
Coneschi, Vlad Silviu
Onut, Andrei
SEMANTIC WEB, 2024, 15 (03) : 831 - 844
[48] Urdu Named Entity Recognition: Corpus Generation and Deep Learning Applications
Kanwal, Safia
Malik, Kamran
Shahzad, Khurram
Aslam, Faisal
Nawaz, Zubair
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2020, 19 (01)
[49] A Method for Building a Labeled Named Entity Recognition Corpus Using Ontologies
Ngoc-Trinh Vu
Van-Hien Tran
Thi-Huyen-Trang Doan
Hoang-Quynh Le
Mai-Vu Tran
ADVANCED COMPUTATIONAL METHODS FOR KNOWLEDGE ENGINEERING, 2015, 358 : 141 - 149
[50] Wojood: Nested Arabic Named Entity Corpus and Recognition using BERT
Jarrar, Mustafa
Khalilia, Mohammed
Ghanem, Sana
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3626 - 3636

← 1 2 3 4 5 →