Annotating the MASC Corpus with BabelNet

被引：0

作者：

Moro, Andrea ^{[1
]}

Navigli, Roberto ^{[1
]}

Tucci, Francesco Maria ^{[1
]}

Passonneau, Rebecca J. ^{[2
]}

机构：

[1] Univ Roma La Sapienza, Dipartimento Informat, I-00185 Rome, Italy

[2] Columbia Univ, Ctr Computat Learning Syst, New York, NY USA

来源：

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年

关键词：

Semantic Annotation; Named Entities; Word Senses; Lexical Ambiguity; Semantic Network; Disambiguation;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

In this paper we tackle the problem of automatically annotating, with both word senses and named entities, the MASC 3.0 corpus, a large English corpus covering a wide range of genres of written and spoken text. We use BabelNet 2.0, a multilingual semantic network which integrates both lexicographic and encyclopedic knowledge, as our sense/entity inventory together with its semantic structure, to perform the aforementioned annotation task. Word sense annotated corpora have been around for more than twenty years, helping the development of Word Sense Disambiguation algorithms by providing both training and testing grounds. More recently Entity Linking has followed the same path, with the creation of huge resources containing annotated named entities. However, to date, there has been no resource that contains both kinds of annotation. In this paper we present an automatic approach for performing this annotation, together with its output on the MASC corpus. We use this corpus because its goal of integrating different types of annotations goes exactly in our same direction. Our overall aim is to stimulate research on the joint exploitation and disambiguation of word senses and named entities. Finally, we estimate the quality of our annotations using both manually-tagged named entities and word senses, obtaining an accuracy of roughly 70% for both named entities and word sense annotations.

引用

页码：4214 / 4219

页数：6

共 50 条

[21] Annotating opinion-evaluation of blogs: the Blogoscopy corpus
Daille, Beatrice
Dubreil, Estelle
Monceaux, Laura
Vernier, Matthieu
LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (04) : 409 - 437
[22] Annotating Event Appearance for Japanese Chess Commentary Corpus
Kameko, Hirotaka
Mori, Shinsukc
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4302 - 4308
[23] MedicalCare: building and annotating an empathy-rich corpus
Sun, Yinglun
Zavala, Jose
Shi, Shuju
Finegold, Rachel
Girju, Roxana
Moore, Jeffrey
LANGUAGE RESOURCES AND EVALUATION, 2025,
[24] Annotating Indirect Anaphora for Hindi : A Corpus Based Study
Singh, Pardeep
Dutta, Kamlesh
2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 525 - 529
[25] A set of parameters for automatically annotating a Sentiment Arabic Corpus
Imane, Guellil
Kareem, Darwish
Faical, Azouaou
INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2019, 15 (05) : 594 - 615
[26] The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’
Philip Resnik
Mari Broman Olsen
Mona Diab
Computers and the Humanities, 1999, 33 : 129 - 153
[27] John of Scythopolis and the Dionysian corpus. Annotating the Areopagite
Williams, JP
JOURNAL OF THEOLOGICAL STUDIES, 1999, 50 : 784 - 788
[28] The Maaloula Aramaic Speech Corpus (MASC): From Printed Material to a Lemmatized and Time-Aligned Corpus
Eid, Ghattas
Seyffarth, Esther
Plag, Ingo
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6513 - 6520
[29] Criteria for Identifying and Annotating Caused Motion Constructions in Corpus Data
Hwang, Jena D.
Zaenen, Annie
Palmer, Martha
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1297 - 1304
[30] Ontology Based Approach for Annotating a Corpus of Computer Science Abstracts
Almugbel, Zainab
2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 81 - 86

← 1 2 3 4 5 →