Building a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data (DoReCo)

被引:0
|
作者
Paschen, Ludger [1 ]
Delafontaine, Francois [2 ]
Draxler, Christoph [3 ]
Fuchs, Susanne [1 ]
Stave, Matthew [2 ]
Seifart, Frank [1 ]
机构
[1] Leibniz Zentrum Allgemeine Sprachwissensch, Schutzenstr 18, D-10117 Berlin, Germany
[2] Lab Dynam Langage, 14 Ave Berthelot, F-69007 Lyon, France
[3] Bavarian Arch Speech Signals, Schellingstr 3, D-80799 Munich, Germany
关键词
corpus creation; endangered languages; phonetic databases;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Natural speech data on many languages have been collected by language documentation projects aiming to preserve linguistic and cultural traditions in audiovisual records. These data hold great potential for large-scale cross-linguistic research into phonetics and language processing. Major obstacles to utilizing such data for typological studies include the non-homogenous nature of file formats and annotation conventions found both across and within archived collections. Moreover, time-aligned audio transcriptions are typically only available at the level of broad (multi-word) phrases but not at the word and segment levels. We report on solutions developed for these issues within the DoReCo (DOcumentation REference COrpus) project. DoReCo aims at providing time-aligned transcriptions for at least 50 collections of under-resourced languages. This paper gives a preliminary overview of the current state of the project and details our workflow, in particular standardization of formats and conventions, the addition of segmental alignments with WebMAUS, and DoReCo's applicability for subsequent research programs. By making the data accessible to the scientific community, DoReCo is designed to bridge the gap between language documentation and linguistic inquiry.
引用
收藏
页码:2657 / 2666
页数:10
相关论文
共 50 条
  • [1] Language vs. individuals in cross-linguistic corpus typology
    Barth, Danielle
    Evans, Nicholas
    Arka, I. Wayan
    Bergqvist, Henrik
    Forker, Diana
    Gipper, Sonja
    Hodge, Gabrielle
    Kashima, Eri
    Kasuga, Yuki
    Kawakami, Carine
    Kimoto, Yukinori
    Knuchel, Dominique
    Kogura, Norikazu
    Kurabe, Keita
    Mansfield, John
    Narrog, Heiko
    Pratiwi, Desak P. Eka
    van Putten, Saskia
    Senge, Chikako
    Tykhostup, Olena
    LANGUAGE DOCUMENTATION & CONSERVATION, 2021, 25 : 179 - 232
  • [2] DECLINE IN THE ELDERLY LANGUAGE - EVIDENCE FROM CROSS-LINGUISTIC DATA
    JUNCOSRABADAN, O
    IGLESIAS, FJ
    JOURNAL OF NEUROLINGUISTICS, 1994, 8 (03) : 183 - 190
  • [3] Time reference in agrammatic aphasia: A cross-linguistic study
    Bastiaanse, Roelien
    Bamyaci, Elif
    Hsu, Chien-Ju
    Lee, Jiyeon
    Duman, Tuba Yarbay
    Thompson, Cynthia K.
    JOURNAL OF NEUROLINGUISTICS, 2011, 24 (06) : 652 - 673
  • [4] The Maaloula Aramaic Speech Corpus (MASC): From Printed Material to a Lemmatized and Time-Aligned Corpus
    Eid, Ghattas
    Seyffarth, Esther
    Plag, Ingo
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6513 - 6520
  • [5] Travelling in time with natural language: Some cross-linguistic considerations
    Mueller, Ana
    Donazzan, Marta
    GRAGOATA-UFF, 2024, 29 (64):
  • [6] Politeness in Korean Sign Language from a cross-linguistic perspective
    Lee, Jungah
    Kim, Hyunah
    Choi, Youngju
    LINGUISTIC RESEARCH, 2024, 41 (02) : 253 - 281
  • [7] Bilingual acquisition of reference: The role of language experience, executive functions and cross-linguistic effects
    Torregrossa, Jacopo
    Andreou, Maria
    Bongartz, Christiane
    Tsimpli, Ianthi Maria
    BILINGUALISM-LANGUAGE AND COGNITION, 2021, 24 (04) : 694 - 706
  • [8] How Cross-Linguistic Differences in the Grammaticalization of Future Time Reference Influence Intertemporal Choices
    Thoma, Dieter
    Tytus, Agnieszka E.
    COGNITIVE SCIENCE, 2018, 42 (03) : 974 - 1000
  • [9] THE CROSS-LINGUISTIC STUDY OF LANGUAGE-ACQUISITION, VOL 1, THE DATA - SLOBIN,DI
    WATSONGEGEO, KA
    AMERICAN ANTHROPOLOGIST, 1987, 89 (03) : 718 - 719
  • [10] THE CROSS-LINGUISTIC STUDY OF LANGUAGE-ACQUISITION, VOL 1 - THE DATA - SLOBIN,DI
    DEUCHAR, M
    LINGUA, 1990, 80 (04) : 352 - 359