WAV2GLOSS: Generating Interlinear Glossed Text from Speech

被引:0
|
作者
He, Taiqi [1 ]
Choi, Kwanghee [1 ]
Tjuatja, Lindia [1 ]
Robinson, Nathaniel R. [2 ]
Shi, Jiatong [1 ]
Neubig, Graham [1 ]
Mortensen, David R. [1 ]
Levin, Lori [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thousands of the world's languages are in danger of extinction-a tremendous threat to cultural identities and human language diversity. Interlinear Glossed Text (IGT) is a form of linguistic annotation that can support documentation and resource creation for these languages' communities. IGT typically consists of (1) transcriptions, (2) morphological segmentation, (3) glosses, and (4) free translations to a majority language. We propose WAV2GLOSS: a task in which these four annotation components are extracted automatically from speech, and introduce the first dataset to this end, FIELDWORK:1 a corpus of speech with all these annotations, derived from the work of field linguists, covering 37 languages, with standard formatting, and train/dev/test splits. We provide various baselines to lay the groundwork for future research on IGT generation from speech, such as end-to-end versus cascaded, monolingual versus multilingual, and single-task versus multi-task approaches.
引用
收藏
页码:568 / 582
页数:15
相关论文
共 24 条
  • [1] Extracting Interlinear Glossed Text from LATEX Documents
    Schenner, Mathias
    Nordhoff, Sebastian
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4044 - 4048
  • [2] Automated Parsing of Interlinear Glossed Text From Page Images of Grammatical Descriptions
    Round, Erich R.
    Macklin-Cordes, Jayden L.
    Ellison, T. Mark
    Beniamine, Sacha
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2878 - 2883
  • [3] IGT2P: From Interlinear Glossed Texts to Paradigms
    Moeller, Sarah
    Liu, Ling
    Yang, Changbing
    Kann, Katharina
    Hulden, Mans
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5251 - 5262
  • [4] Generating coherent spontaneous speech and gesture from text
    Alexanderson, Simon
    Szekely, Eva
    Henter, Gustav Eje
    Kucherenko, Taras
    Beskow, Jonas
    PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (ACM IVA 2020), 2020,
  • [5] A Preliminary Study on Wav2Vec 2.0 Embeddings for Text-to-Speech
    Lim, Yohan
    Kim, Namhyeong
    Yun, Seung
    Kim, Hun
    Lee, Seung-Ik
    12TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE (ICTC 2021): BEYOND THE PANDEMIC ERA WITH ICT CONVERGENCE INNOVATION, 2021, : 343 - 347
  • [6] FastTalker: An unified framework for generating speech and conversational gestures from text
    Zhang, Jian
    Guo, Zixin
    He, Minggui
    Yoshie, Osamu
    NEUROCOMPUTING, 2025, 638
  • [8] Text2Storyline: Generating Enriched Storylines from Text
    Goncalves, Francisco
    Campos, Ricardo
    Jorge, Alipio
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III, 2023, 13982 : 248 - 254
  • [9] Wav2KWS: Transfer Learning from Speech Representations for Keyword Spotting
    Seo, Deokjin
    Oh, Heung-Seon
    Jung, Yuchul
    Jung, Yuchul (jyc@kumoh.ac.kr), 1600, Institute of Electrical and Electronics Engineers Inc. (09): : 80682 - 80691
  • [10] Wav2KWS: Transfer Learning From Speech Representations for Keyword Spotting
    Seo, Deokjin
    Oh, Heung-Seon
    Jung, Yuchul
    IEEE ACCESS, 2021, 9 : 80682 - 80691