An Annotated Multilingual Dataset to Study Modality in the Gospels

被引:0
|
作者
Bermudez-Sabel, Helena [1 ]
Dell'Oro, Francesca [1 ,2 ]
机构
[1] Univ Neuchatel, Neuchatel, Switzerland
[2] Swiss Natl Sci Fdn, Bern, Switzerland
来源
DIGITAL HUMANITIES QUARTERLY | 2024年 / 18卷 / 01期
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
C [社会科学总论];
学科分类号
03 ; 0303 ;
摘要
This paper presents a number of resources for examining the expression of modality in the Gospels. The main resource is an XML-TEI dataset that contains the linguistic annotation of a predefined list of potentially modal markers in both Ancient Greek and Latin. When one of these markers conveys a modal meaning, each constituent of the modal passage (i.e., the marker, its scope, and the modal relation between them) is annotated with a great level of detail through several linguistic features. One of the original features of our dataset is the implementation of a cross-referencing system that enables the alignment of the potentially modal markers of both languages. To facilitate the exploitation of our data by those unfamiliar with XML technologies, we also provide summary tables with the most relevant features of the annotation. In addition, a program written in Apache Ant allows any user to generate the summary sheets and to align modal passages in both Ancient Greek and Latin with any other language available in the Multilingual Bible Parallel Corpus [Christodouloupoulos and Steedman 2015]. This contribution presents the details of the semantic annotation and its formalization, and how our resources may be exploited within semantics and translation studies. In addition, the encoding strategies implemented are relevant for other projects dealing with the combination of multiple layers of (linguistic) annotation and/or tackling the development of parallel corpora.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 50 条
  • [1] The complete gospels: Annotated scholars version
    McNamara, M
    [J]. HEYTHROP JOURNAL, 1998, 39 (01): : 71 - 75
  • [2] Multilingual Image Corpus - Towards a Multimodal and Multilingual Dataset
    Koeva, Svetla
    Stoyanova, Ivelina
    Kralev, Jordan
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1509 - 1518
  • [3] A multilingual, multimodal dataset of aggression and bias: the ComMA dataset
    Kumar, Ritesh
    Ratan, Shyam
    Singh, Siddharth
    Nandi, Enakshi
    Devi, Laishram Niranjana
    Bhagat, Akash
    Dawer, Yogesh
    Lahiri, Bornini
    Bansal, Akanksha
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (02) : 757 - 837
  • [4] Expert-Annotated Dataset to Study Cyberbullying in Polish Language
    Ptaszynski, Michal
    Pieciukiewicz, Agata
    Dybala, Pawel
    Skrzek, Pawel
    Soliwoda, Kamil
    Fortuna, Marcin
    Leliwa, Gniewosz
    Wroczynski, Michal
    [J]. DATA, 2024, 9 (01)
  • [5] An Annotated Dataset of Literary Entities
    Bamman, David
    Popat, Sejal
    Shen, Sheng
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2138 - 2144
  • [7] A Dataset and Baselines for Multilingual Reply Suggestion
    Zhang, Mozhi
    Wang, Wei
    Deb, Budhaditya
    Zheng, Guoqing
    Shokouhi, Milad
    Awadallah, Ahmed Hassan
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1207 - 1220
  • [8] Leyzer: A Dataset for Multilingual Virtual Assistants
    Sowanski, Marcin
    Janicki, Artur
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 477 - 486
  • [9] CMU WILDERNESS MULTILINGUAL SPEECH DATASET
    Black, Alan W.
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5971 - 5975
  • [10] Slovak Dataset for Multilingual Question Answering
    Hladek, Daniel
    Stas, Jan
    Juhar, Jozef
    Koctur, Tomas
    [J]. IEEE ACCESS, 2023, 11 : 32869 - 32881