An Annotated Multilingual Dataset to Study Modality in the Gospels

被引：0

作者：

Bermudez-Sabel, Helena ^{[1
]}

Dell'Oro, Francesca ^{[1
,2
]}

机构：

[1] Univ Neuchatel, Neuchatel, Switzerland

[2] Swiss Natl Sci Fdn, Bern, Switzerland

来源：

DIGITAL HUMANITIES QUARTERLY | 2024年 / 18卷 / 01期

基金：

瑞士国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

C [社会科学总论];

学科分类号：

03 ; 0303 ;

摘要：

This paper presents a number of resources for examining the expression of modality in the Gospels. The main resource is an XML-TEI dataset that contains the linguistic annotation of a predefined list of potentially modal markers in both Ancient Greek and Latin. When one of these markers conveys a modal meaning, each constituent of the modal passage (i.e., the marker, its scope, and the modal relation between them) is annotated with a great level of detail through several linguistic features. One of the original features of our dataset is the implementation of a cross-referencing system that enables the alignment of the potentially modal markers of both languages. To facilitate the exploitation of our data by those unfamiliar with XML technologies, we also provide summary tables with the most relevant features of the annotation. In addition, a program written in Apache Ant allows any user to generate the summary sheets and to align modal passages in both Ancient Greek and Latin with any other language available in the Multilingual Bible Parallel Corpus [Christodouloupoulos and Steedman 2015]. This contribution presents the details of the semantic annotation and its formalization, and how our resources may be exploited within semantics and translation studies. In addition, the encoding strategies implemented are relevant for other projects dealing with the combination of multiple layers of (linguistic) annotation and/or tackling the development of parallel corpora.

引用

页码：1 / 16

页数：16

共 50 条

[21] A Multilingual Evaluation Dataset for MonolingualWord Sense Alignment
Ahmadi, Sina
McCrae, John P.
Nimb, Sanni
Khan, Fahad
Monachini, Monica
Pedersen, Bolette S.
Declerck, Thierry
Wissik, Tanja
Bellandi, Andrea
Pisani, Irene
Troelsgard, Thomas
Olsen, Sussi
Krek, Simon
Lipp, Veronika
Varadi, Tamas
Simon, Laszlo
Gyorffy, Andras
Tiberius, Carole
Schoonheim, Tanneke
Ben Moshe, Yifat
Rudich, Maya
Abu Ahmad, Raya
Lonke, Dorielle
Kovalenko, Kira
Langemets, Margit
Kallas, Jelena
Dereza, Oksana
Fransen, Theodorus
Cillessen, David
Lindemann, David
Alonso, Mikel
Salgado, Ana
Sancho, Jose Luis
Urena-Ruiz, Rafael-J
Porta Zamorano, Jordi
Simov, Kiril
Osenova, Petya
Kancheva, Zara
Radev, Ivaylo
Stankovic, Ranka
Perdih, Andrej
Gabrovsek, Dejan
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3232 - 3242
[22] Building a Dataset of Multilingual Cognates for the Romanian Lexicon
Ciobanu, Alina Maria
Dinu, Liviu P.
[J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1038 - 1043
[23] Multilingual Topic Classification in X: Dataset and Analysis
Antypas, Dimosthenis
Ushio, Asahi
Barbieri, Francesco
Camacho-Collados, Jose
[J]. EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2024, : 20136 - 20152
[24] VoxEL: A Benchmark Dataset for Multilingual Entity Linking
Rosales-Mendez, Henry
Hogan, Aidan
Poblete, Barbara
[J]. SEMANTIC WEB - ISWC 2018, PT II, 2018, 11137 : 170 - 186
[25] Annotated Flickr dataset for identification of professional photographers
Marco, Ruben Gaspar
Strukova, Sofia
Marmol, Felix Gomez
Ruiperez-Valiente, Jose A.
[J]. DATA IN BRIEF, 2023, 50
[26] Dataset: Annotated soybean market news articles
dos Reis Filho, Ivan Jose
Coleti, Jamille de Campos
Marcacini, Ricardo Marcondes
Rezende, Solange Oliveira
[J]. DATA IN BRIEF, 2024, 55
[27] An Annotated Dataset of Discourse Modes in Hindi Stories
Dhanwal, Swapnil
Dutta, Hritwik
Nankani, Hitesh
Shrivastava, Nilay
Kumar, Yaman
Li, Junyi Jessy
Mahata, Debanjan
Gosangi, Rakesh
Zhang, Haimin
Shah, Rajiv Ratn
Stent, Amanda
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1191 - 1196
[28] Annotated-VocalSet: A Singing Voice Dataset
Faghih, Behnam
Timoney, Joseph
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
[29] An annotated dataset of bioacoustic sensing and features of mosquitoes
Vasconcelos, Dinarte
Nunes, Nuno Jardim
Gomes, Joao
[J]. SCIENTIFIC DATA, 2020, 7 (01)
[30] The Belt and Road Initiative on Twitter: An annotated dataset
Man, Chun-Yin
Palmer, David A.
Qian, Junxi
[J]. DATA IN BRIEF, 2022, 45

← 1 2 3 4 5 →