Annotating the MASC Corpus with BabelNet

被引:0
|
作者
Moro, Andrea [1 ]
Navigli, Roberto [1 ]
Tucci, Francesco Maria [1 ]
Passonneau, Rebecca J. [2 ]
机构
[1] Univ Roma La Sapienza, Dipartimento Informat, I-00185 Rome, Italy
[2] Columbia Univ, Ctr Computat Learning Syst, New York, NY USA
关键词
Semantic Annotation; Named Entities; Word Senses; Lexical Ambiguity; Semantic Network; Disambiguation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper we tackle the problem of automatically annotating, with both word senses and named entities, the MASC 3.0 corpus, a large English corpus covering a wide range of genres of written and spoken text. We use BabelNet 2.0, a multilingual semantic network which integrates both lexicographic and encyclopedic knowledge, as our sense/entity inventory together with its semantic structure, to perform the aforementioned annotation task. Word sense annotated corpora have been around for more than twenty years, helping the development of Word Sense Disambiguation algorithms by providing both training and testing grounds. More recently Entity Linking has followed the same path, with the creation of huge resources containing annotated named entities. However, to date, there has been no resource that contains both kinds of annotation. In this paper we present an automatic approach for performing this annotation, together with its output on the MASC corpus. We use this corpus because its goal of integrating different types of annotations goes exactly in our same direction. Our overall aim is to stimulate research on the joint exploitation and disambiguation of word senses and named entities. Finally, we estimate the quality of our annotations using both manually-tagged named entities and word senses, obtaining an accuracy of roughly 70% for both named entities and word sense annotations.
引用
收藏
页码:4214 / 4219
页数:6
相关论文
共 50 条
  • [21] Annotating opinion-evaluation of blogs: the Blogoscopy corpus
    Daille, Beatrice
    Dubreil, Estelle
    Monceaux, Laura
    Vernier, Matthieu
    LANGUAGE RESOURCES AND EVALUATION, 2011, 45 (04) : 409 - 437
  • [22] Annotating Event Appearance for Japanese Chess Commentary Corpus
    Kameko, Hirotaka
    Mori, Shinsukc
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4302 - 4308
  • [23] MedicalCare: building and annotating an empathy-rich corpus
    Sun, Yinglun
    Zavala, Jose
    Shi, Shuju
    Finegold, Rachel
    Girju, Roxana
    Moore, Jeffrey
    LANGUAGE RESOURCES AND EVALUATION, 2025,
  • [24] Annotating Indirect Anaphora for Hindi : A Corpus Based Study
    Singh, Pardeep
    Dutta, Kamlesh
    2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 525 - 529
  • [25] A set of parameters for automatically annotating a Sentiment Arabic Corpus
    Imane, Guellil
    Kareem, Darwish
    Faical, Azouaou
    INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2019, 15 (05) : 594 - 615
  • [26] The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’
    Philip Resnik
    Mari Broman Olsen
    Mona Diab
    Computers and the Humanities, 1999, 33 : 129 - 153
  • [27] John of Scythopolis and the Dionysian corpus. Annotating the Areopagite
    Williams, JP
    JOURNAL OF THEOLOGICAL STUDIES, 1999, 50 : 784 - 788
  • [28] The Maaloula Aramaic Speech Corpus (MASC): From Printed Material to a Lemmatized and Time-Aligned Corpus
    Eid, Ghattas
    Seyffarth, Esther
    Plag, Ingo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6513 - 6520
  • [29] Criteria for Identifying and Annotating Caused Motion Constructions in Corpus Data
    Hwang, Jena D.
    Zaenen, Annie
    Palmer, Martha
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1297 - 1304
  • [30] Ontology Based Approach for Annotating a Corpus of Computer Science Abstracts
    Almugbel, Zainab
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 81 - 86