Annotating the MASC Corpus with BabelNet

被引：0

作者：

Moro, Andrea ^{[1
]}

Navigli, Roberto ^{[1
]}

Tucci, Francesco Maria ^{[1
]}

Passonneau, Rebecca J. ^{[2
]}

机构：

[1] Univ Roma La Sapienza, Dipartimento Informat, I-00185 Rome, Italy

[2] Columbia Univ, Ctr Computat Learning Syst, New York, NY USA

来源：

LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年

关键词：

Semantic Annotation; Named Entities; Word Senses; Lexical Ambiguity; Semantic Network; Disambiguation;

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

In this paper we tackle the problem of automatically annotating, with both word senses and named entities, the MASC 3.0 corpus, a large English corpus covering a wide range of genres of written and spoken text. We use BabelNet 2.0, a multilingual semantic network which integrates both lexicographic and encyclopedic knowledge, as our sense/entity inventory together with its semantic structure, to perform the aforementioned annotation task. Word sense annotated corpora have been around for more than twenty years, helping the development of Word Sense Disambiguation algorithms by providing both training and testing grounds. More recently Entity Linking has followed the same path, with the creation of huge resources containing annotated named entities. However, to date, there has been no resource that contains both kinds of annotation. In this paper we present an automatic approach for performing this annotation, together with its output on the MASC corpus. We use this corpus because its goal of integrating different types of annotations goes exactly in our same direction. Our overall aim is to stimulate research on the joint exploitation and disambiguation of word senses and named entities. Finally, we estimate the quality of our annotations using both manually-tagged named entities and word senses, obtaining an accuracy of roughly 70% for both named entities and word sense annotations.

引用

页码：4214 / 4219

页数：6

共 50 条

[1] MASC: MASSIVE ARABIC SPEECH CORPUS
Al-Fetyani, Mohammad
Al-Barham, Muhammad
Abandah, Gheith
Alsharkawi, Adham
Dawas, Maha
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1006 - 1013
[2] The MASC Word Sense Sentence Corpus
Passonneau, Rebecca J.
Baker, Collin
Fellbaum, Christiane
Ide, Nancy
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3025 - 3030
[3] Annotating Events in an Emotion Corpus
Lee, Sophia Yat Mei
Li, Shoushan
Huang, Chu-Ren
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3511 - 3516
[4] Annotating an Arabic Learner Corpus for Error
Abuhakema, Ghazi
Faraj, Reem
Feldman, Anna
Fitzpatrick, Eileen
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1347 - 1350
[5] FrSemCor: Annotating a French corpus with supersenses
Barque, L.
Haas, P.
Huyghe, R.
Tribout, D.
Candito, M.
Crabbe, B.
Segonne, V
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5904 - 5910
[6] Annotating Arguments in a Corpus of Opinion Articles
Rocha, Gil
Trigo, Luis
Cardoso, Henrique Lopes
Sousa-Silva, Rui
Carvalho, Paula
Martins, Bruno
Won, Miguel
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1890 - 1899
[7] Annotating Arguments in a Parliamentary Corpus: An Experience
Koit, Mare
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KEOD), VOL 2, 2020, : 213 - 218
[8] Annotating Errors in a Hungarian Learner Corpus
Dickinson, Markus
Ledbetter, Scott
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1659 - 1664
[9] John of Scythopolis and the Dionysian corpus: Annotating the Areopagite
Beggiani, S
THEOLOGICAL STUDIES, 2000, 61 (01) : 188 - 189
[10] Annotating the Enron Email Corpus with Number Senses
Moore, Stuart
Buchholz, Sabine
Korhonen, Anna
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1452 - 1455

← 1 2 3 4 5 →