Building a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data (DoReCo)

被引:0
|
作者
Paschen, Ludger [1 ]
Delafontaine, Francois [2 ]
Draxler, Christoph [3 ]
Fuchs, Susanne [1 ]
Stave, Matthew [2 ]
Seifart, Frank [1 ]
机构
[1] Leibniz Zentrum Allgemeine Sprachwissensch, Schutzenstr 18, D-10117 Berlin, Germany
[2] Lab Dynam Langage, 14 Ave Berthelot, F-69007 Lyon, France
[3] Bavarian Arch Speech Signals, Schellingstr 3, D-80799 Munich, Germany
关键词
corpus creation; endangered languages; phonetic databases;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Natural speech data on many languages have been collected by language documentation projects aiming to preserve linguistic and cultural traditions in audiovisual records. These data hold great potential for large-scale cross-linguistic research into phonetics and language processing. Major obstacles to utilizing such data for typological studies include the non-homogenous nature of file formats and annotation conventions found both across and within archived collections. Moreover, time-aligned audio transcriptions are typically only available at the level of broad (multi-word) phrases but not at the word and segment levels. We report on solutions developed for these issues within the DoReCo (DOcumentation REference COrpus) project. DoReCo aims at providing time-aligned transcriptions for at least 50 collections of under-resourced languages. This paper gives a preliminary overview of the current state of the project and details our workflow, in particular standardization of formats and conventions, the addition of segmental alignments with WebMAUS, and DoReCo's applicability for subsequent research programs. By making the data accessible to the scientific community, DoReCo is designed to bridge the gap between language documentation and linguistic inquiry.
引用
收藏
页码:2657 / 2666
页数:10
相关论文
共 50 条
  • [41] Discourse-Level Information Recall in Early and Late Bilinguals: Evidence From Single-Language and Cross-Linguistic Tasks
    Chou, Isabelle
    Hu, Jiehui
    Munoz, Edinson
    Garcia, Adolfo M.
    FRONTIERS IN PSYCHOLOGY, 2021, 12
  • [42] The effects of cross-linguistic similarity on phrase-level language switching: evidence from high-proficient Chinese–English bilinguals
    Xin Chang
    Xue-yi Huang
    Xin-zhe Zou
    Peijuan Wang
    Pei Wang
    Cognitive Processing, 2023, 24 : 415 - 424
  • [43] Examining Co-activation Through Cross-Linguistic Influence among Bilinguals in Spoken Language Processing: Evidence from Eye Movements
    Soh, Or-Kan
    Azman, Hazita
    Mei, Ho Su
    3L-LANGUAGE LINGUISTICS LITERATURE-THE SOUTHEAST ASIAN JOURNAL OF ENGLISH LANGUAGE STUDIES, 2020, 26 (04): : 45 - 57
  • [44] Cross-linguistic influence in unbalanced bilingual heritage speakers on subsequent language acquisition: Evidence from pronominal object placement in ditransitive clauses
    Lorenz, Eliane
    Bonnie, Richard J.
    Feindt, Kathrin
    Rahbari, Sharareh
    Siemund, Peter
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2019, 23 (06) : 1410 - 1430
  • [45] Building a unified model of the Optional Infinitive Stage: Simulating the cross-linguistic pattern of verb-marking error in typically developing children and children with Developmental Language Disorder
    Pine, Julian M. M.
    Freudenthal, Daniel
    Gobet, Fernand
    JOURNAL OF CHILD LANGUAGE, 2023, 50 (06) : 1336 - 1352
  • [46] The effects of cross-linguistic similarity on phrase-level language switching: evidence from high-proficient Chinese-English bilinguals
    Chang, Xin
    Huang, Xue-yi
    Zou, Xin-zhe
    Wang, Peijuan
    Wang, Pei
    COGNITIVE PROCESSING, 2023, 24 (03) : 415 - 424
  • [47] Noun and verb knowledge in monolingual preschool children across 17 languages: Data from Cross-linguistic Lexical Tasks (LITMUS-CLT)
    Haman, Ewa
    Luniewska, Magdalena
    Hansen, Pernille
    Simonsen, Hanne Gram
    Chiat, Shula
    Bjekic, Jovana
    Blaziene, Agne
    Chyl, Katarzyna
    Dabasinskiene, Ineta
    de Abreu, Pascale Engel
    Gagarina, Natalia
    Gavarro, Anna
    Hakansson, Gisela
    Harel, Efrat
    Holm, Elisabeth
    Kapalkova, Svetlana
    Kunnari, Sari
    Levorato, Chiara
    Lindgren, Josefin
    Mieszkowska, Karolina
    Montes Salarich, Laia
    Potgieter, Anneke
    Ribu, Ingeborg
    Ringblom, Natalia
    Rinker, Tanja
    Roch, Maja
    Slancova, Daniela
    Southwood, Frenette
    Tedeschi, Roberta
    Tuncer, Aylin Muge
    Unal-Logacev, Ozlem
    Vuksanovic, Jasmina
    Armon-Lotem, Sharon
    CLINICAL LINGUISTICS & PHONETICS, 2017, 31 (11-12) : 818 - 843
  • [48] The time-course of competition from the L1 grammar in L2 sentence processing: Evidence from cross-linguistic structural priming
    Hopp, Holger
    Gruter, Theres
    SECOND LANGUAGE RESEARCH, 2023, 39 (01) : 133 - 159
  • [49] CROSS-LINGUISTIC INFLUENCE IN THE SYNTACTIC DOMAIN IN SIMULTANEOUS LANGUAGE ACQUISITION: EVIDENCE FROM EXTRACTION CONSTRUCTIONS INVOLVING THE OBJECT OF A PREPOSITION IN THE SPEECH OF AN ENGLISH-SPANISH BILINGUAL CHILD
    Vasquez Carranza, Luz Marina
    REVISTA KANINA, 2009, 33 (01): : 85 - 105
  • [50] Reading and Memory Skills of Children with and without Dyslexia in Greek (L1) and English (L2) as a Second Language: Preliminary Results from a Cross-Linguistic Approach
    Gkountakou, Maria-Ioanna
    Talli, Ioanna
    LANGUAGES, 2024, 9 (09)