Annotating the MASC Corpus with BabelNet

被引：0

作者：

Moro, Andrea ^{[1
]}

Navigli, Roberto ^{[1
]}

Tucci, Francesco Maria ^{[1
]}

Passonneau, Rebecca J. ^{[2
]}

机构：

[1] Univ Roma La Sapienza, Dipartimento Informat, I-00185 Rome, Italy

[2] Columbia Univ, Ctr Computat Learning Syst, New York, NY USA

来源：

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年

关键词：

Semantic Annotation; Named Entities; Word Senses; Lexical Ambiguity; Semantic Network; Disambiguation;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

In this paper we tackle the problem of automatically annotating, with both word senses and named entities, the MASC 3.0 corpus, a large English corpus covering a wide range of genres of written and spoken text. We use BabelNet 2.0, a multilingual semantic network which integrates both lexicographic and encyclopedic knowledge, as our sense/entity inventory together with its semantic structure, to perform the aforementioned annotation task. Word sense annotated corpora have been around for more than twenty years, helping the development of Word Sense Disambiguation algorithms by providing both training and testing grounds. More recently Entity Linking has followed the same path, with the creation of huge resources containing annotated named entities. However, to date, there has been no resource that contains both kinds of annotation. In this paper we present an automatic approach for performing this annotation, together with its output on the MASC corpus. We use this corpus because its goal of integrating different types of annotations goes exactly in our same direction. Our overall aim is to stimulate research on the joint exploitation and disambiguation of word senses and named entities. Finally, we estimate the quality of our annotations using both manually-tagged named entities and word senses, obtaining an accuracy of roughly 70% for both named entities and word sense annotations.

引用

页码：4214 / 4219

页数：6

共 50 条

[31] Classification of the Mask Augsburg Speech Corpus (MASC) Using the Consistency Learning Method
Wang, Dezhi
Zou, Dan
Cheng, Xinghua
Xiao, Wenbin
2020 5TH INTERNATIONAL CONFERENCE ON COMMUNICATION, IMAGE AND SIGNAL PROCESSING (CCISP 2020), 2020, : 169 - 173
[32] IARG-AnCora: Annotating AnCora corpus with implicit arguments
Taule, Mariona
Antonia Marti, M.
Penis, Aina
Rodriguez, Horacio
Moreno, Lidia
Moreda, Paloma
PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (49): : 181 - 184
[33] A Transfer Learning Framework For Annotating Implementation-Specific Corpus
Ponniah, Anbumunee
Agarwal, Swati
Ranka, Sharanya Milind
Madhusudhan, Shashank
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 503 - 512
[34] Review of Practices of Collecting and Annotating Texts in the Learner Corpus REALEC
Vinogradova, Olga
Lyashevskaya, Olga
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 77 - 88
[35] The UIR Uncertainty Corpus for Chinese: Annotating Chinese Microblog Corpus for Uncertainty Identification from Social Media
Li, Binyang
Xiang, Jun
Chen, Le
Han, Xu
Yu, Xiaoyan
Xu, Ruifeng
Wang, Tengjiao
Wong, Kam-fai
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 495 - 498
[36] Annotating progressive aspect constructions in the spoken section of the British National Corpus
Caines, Andrew
Buttery, Paula
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1699 - 1704
[37] PhenoDEF: a corpus for annotating sentences with information of phenotype definitions in biomedical literature
Binkheder, Samar
Wu, Heng-Yi
Quinney, Sara K.
Zhang, Shijun
Zitu, Md Muntasir
Chiang, Chien-Wei
Wang, Lei
Jones, Josette
Li, Lang
JOURNAL OF BIOMEDICAL SEMANTICS, 2022, 13 (01)
[38] Ten Years of BabelNet: A Survey
Navigli, Roberto
Bevilacqua, Michele
Conia, Simone
Montagnini, Dario
Cecconi, Francesco
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4559 - 4567
[39] Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus
Uryupina, Olga
Artstein, Ron
Bristot, Antonella
Cavicchio, Federica
Delogu, Francesca
Rodriguez, Kepa J.
Poesio, Massimo
NATURAL LANGUAGE ENGINEERING, 2020, 26 (01) : 95 - 128
[40] Annotating Modality Expressions and Event Factuality for a Japanese Chess Commentary Corpus
Matsuyoshi, Suguru
Kameko, Hirotaka
Murawaki, Yugo
Mori, Shinsuke
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2475 - 2481

← 1 2 3 4 5 →