An Annotated Multilingual Dataset to Study Modality in the Gospels

被引:0
|
作者
Bermudez-Sabel, Helena [1 ]
Dell'Oro, Francesca [1 ,2 ]
机构
[1] Univ Neuchatel, Neuchatel, Switzerland
[2] Swiss Natl Sci Fdn, Bern, Switzerland
来源
DIGITAL HUMANITIES QUARTERLY | 2024年 / 18卷 / 01期
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
This paper presents a number of resources for examining the expression of modality in the Gospels. The main resource is an XML-TEI dataset that contains the linguistic annotation of a predefined list of potentially modal markers in both Ancient Greek and Latin. When one of these markers conveys a modal meaning, each constituent of the modal passage (i.e., the marker, its scope, and the modal relation between them) is annotated with a great level of detail through several linguistic features. One of the original features of our dataset is the implementation of a cross-referencing system that enables the alignment of the potentially modal markers of both languages. To facilitate the exploitation of our data by those unfamiliar with XML technologies, we also provide summary tables with the most relevant features of the annotation. In addition, a program written in Apache Ant allows any user to generate the summary sheets and to align modal passages in both Ancient Greek and Latin with any other language available in the Multilingual Bible Parallel Corpus [Christodouloupoulos and Steedman 2015]. This contribution presents the details of the semantic annotation and its formalization, and how our resources may be exploited within semantics and translation studies. In addition, the encoding strategies implemented are relevant for other projects dealing with the combination of multiple layers of (linguistic) annotation and/or tackling the development of parallel corpora.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [21] A Multilingual Evaluation Dataset for MonolingualWord Sense Alignment
    Ahmadi, Sina
    McCrae, John P.
    Nimb, Sanni
    Khan, Fahad
    Monachini, Monica
    Pedersen, Bolette S.
    Declerck, Thierry
    Wissik, Tanja
    Bellandi, Andrea
    Pisani, Irene
    Troelsgard, Thomas
    Olsen, Sussi
    Krek, Simon
    Lipp, Veronika
    Varadi, Tamas
    Simon, Laszlo
    Gyorffy, Andras
    Tiberius, Carole
    Schoonheim, Tanneke
    Ben Moshe, Yifat
    Rudich, Maya
    Abu Ahmad, Raya
    Lonke, Dorielle
    Kovalenko, Kira
    Langemets, Margit
    Kallas, Jelena
    Dereza, Oksana
    Fransen, Theodorus
    Cillessen, David
    Lindemann, David
    Alonso, Mikel
    Salgado, Ana
    Sancho, Jose Luis
    Urena-Ruiz, Rafael-J
    Porta Zamorano, Jordi
    Simov, Kiril
    Osenova, Petya
    Kancheva, Zara
    Radev, Ivaylo
    Stankovic, Ranka
    Perdih, Andrej
    Gabrovsek, Dejan
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3232 - 3242
  • [22] Building a Dataset of Multilingual Cognates for the Romanian Lexicon
    Ciobanu, Alina Maria
    Dinu, Liviu P.
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1038 - 1043
  • [23] Multilingual Topic Classification in X: Dataset and Analysis
    Antypas, Dimosthenis
    Ushio, Asahi
    Barbieri, Francesco
    Camacho-Collados, Jose
    [J]. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2024, : 20136 - 20152
  • [24] VoxEL: A Benchmark Dataset for Multilingual Entity Linking
    Rosales-Mendez, Henry
    Hogan, Aidan
    Poblete, Barbara
    [J]. SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 170 - 186
  • [25] Annotated Flickr dataset for identification of professional photographers
    Marco, Ruben Gaspar
    Strukova, Sofia
    Marmol, Felix Gomez
    Ruiperez-Valiente, Jose A.
    [J]. DATA IN BRIEF, 2023, 50
  • [26] Dataset: Annotated soybean market news articles
    dos Reis Filho, Ivan Jose
    Coleti, Jamille de Campos
    Marcacini, Ricardo Marcondes
    Rezende, Solange Oliveira
    [J]. DATA IN BRIEF, 2024, 55
  • [27] An Annotated Dataset of Discourse Modes in Hindi Stories
    Dhanwal, Swapnil
    Dutta, Hritwik
    Nankani, Hitesh
    Shrivastava, Nilay
    Kumar, Yaman
    Li, Junyi Jessy
    Mahata, Debanjan
    Gosangi, Rakesh
    Zhang, Haimin
    Shah, Rajiv Ratn
    Stent, Amanda
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1191 - 1196
  • [28] Annotated-VocalSet: A Singing Voice Dataset
    Faghih, Behnam
    Timoney, Joseph
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [29] An annotated dataset of bioacoustic sensing and features of mosquitoes
    Vasconcelos, Dinarte
    Nunes, Nuno Jardim
    Gomes, Joao
    [J]. SCIENTIFIC DATA, 2020, 7 (01)
  • [30] The Belt and Road Initiative on Twitter: An annotated dataset
    Man, Chun-Yin
    Palmer, David A.
    Qian, Junxi
    [J]. DATA IN BRIEF, 2022, 45