AsNER - Annotated Dataset and Baseline for Assamese Named Entity recognition

被引：0

作者：

Pathak, Dhrubajyoti ^{[1
]}

Nandi, Sukumar ^{[1
]}

Sarmah, Priyankoo ^{[1
]}

机构：

[1] Indian Inst Technol Guwahati, North Guwahati, India

来源：

LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年

关键词：

NER dataset; Language Resources; Assamese NER; Assamese Language; Named Entity Recognition; NER model; AsNER;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We present the AsNER, a named entity annotation dataset for low resource Assamese language with a baseline Assamese NER model. The dataset contains about 99k tokens comprised of text from the speech of the Prime Minister of India and Assamese play. It also contains person names, location names and addresses. The proposed NER dataset is likely to be a significant resource for deep neural based Assamese language processing. We benchmark the dataset by training NER models and evaluating using state-of-the-art architectures for supervised named entity recognition (NER) such as Fasttext, BERT, XLM-R, FLAIR, MuRIL etc. We implement several baseline approaches with state-of-the-art sequence tagging Bi-LSTM-CRF architecture. The highest F1-score among all baselines achieves an accuracy of 80.69% when using MuRIL as a word embedding method. The annotated dataset and the top performing model are made publicly available.

引用

页码：6571 / 6577

页数：7

共 50 条

[21] EduNER: a Chinese named entity recognition dataset for education research
Xu Li
Chengkun Wei
Zhuoren Jiang
Wenlong Meng
Fan Ouyang
Zihui Zhang
Wenzhi Chen
Neural Computing and Applications, 2023, 35 : 17717 - 17731
[22] NNE: A Dataset for Nested Named Entity Recognition in English Newswire
Ringland, Nicky
Dai, Xiang
Hachey, Ben
Karimi, Sarvnaz
Paris, Cecile
Curran, James R.
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5176 - 5181
[23] Interpretable Multi-dataset Evaluation for Named Entity Recognition
Fu, Jinlan
Liu, Pengfei
Neubig, Graham
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6058 - 6069
[24] EduNER: a Chinese named entity recognition dataset for education research
Li, Xu
Wei, Chengkun
Jiang, Zhuoren
Meng, Wenlong
Ouyang, Fan
Zhang, Zihui
Chen, Wenzhi
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (24): : 17717 - 17731
[25] Statistical dataset evaluation: A case study on named entity recognition
Wang, Chengwen
Dong, Qingxiu
Wang, Xiaochen
Sui, Zhifang
NATURAL LANGUAGE PROCESSING, 2024,
[26] B-NER: A Novel Bangla Named Entity Recognition Dataset With Largest Entities and Its Baseline Evaluation
Haque, Md. Zahidul
Zaman, Sakib
Saurav, Jillur Rahman
Haque, Summit
Islam, Md. Saiful
Amin, Mohammad Ruhul
IEEE ACCESS, 2023, 11 : 45194 - 45205
[27] An Embarrassingly Easy but Strong Baseline for Nested Named Entity Recognition
Yan, Hang
Sun, Yu
Li, Xiaonan
Qiu, Xipeng
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1442 - 1452
[28] Multi-Layout Invoice Document Dataset (MIDD): A Dataset for Named Entity Recognition
Baviskar, Dipali
Ahirrao, Swati
Kotecha, Ketan
DATA, 2021, 6 (07)
[29] A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
Hamdi, Ahmed
Pontes, Elvys Linhares
Boros, Emanuela
Thi Tuyet Hai Nguyen
Hackl, Guenter
Moreno, Jose G.
Doucet, Antoine
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2328 - 2334
[30] Dataset Enhancement and Multilingual Transfer for Named Entity Recognition in the Indonesian Language
Khairunnisa, Siti Oryza
Chen, Zhousi
Komachi, Mamoru
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)

← 1 2 3 4 5 →