WAV2GLOSS: Generating Interlinear Glossed Text from Speech

被引：0

作者：

He, Taiqi ^{[1
]}

Choi, Kwanghee ^{[1
]}

Tjuatja, Lindia ^{[1
]}

Robinson, Nathaniel R. ^{[2
]}

Shi, Jiatong ^{[1
]}

Neubig, Graham ^{[1
]}

Mortensen, David R. ^{[1
]}

Levin, Lori ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Thousands of the world's languages are in danger of extinction-a tremendous threat to cultural identities and human language diversity. Interlinear Glossed Text (IGT) is a form of linguistic annotation that can support documentation and resource creation for these languages' communities. IGT typically consists of (1) transcriptions, (2) morphological segmentation, (3) glosses, and (4) free translations to a majority language. We propose WAV2GLOSS: a task in which these four annotation components are extracted automatically from speech, and introduce the first dataset to this end, FIELDWORK:1 a corpus of speech with all these annotations, derived from the work of field linguists, covering 37 languages, with standard formatting, and train/dev/test splits. We provide various baselines to lay the groundwork for future research on IGT generation from speech, such as end-to-end versus cascaded, monolingual versus multilingual, and single-task versus multi-task approaches.

引用

页码：568 / 582

页数：15

共 24 条

[21] Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders
Li, Jing
Kang, Di
Pei, Wenjie
Zhe, Xuefei
Zhang, Ying
He, Zhenyu
Bao, Linchao
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11273 - 11282
[22] Text2Simulate: A Scientific Knowledge Visualization Technique for Generating Visual Simulations from Textual Knowledge
Ige, Ifeoluwatayo A.
Oladejo, Bolanle F.
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (02) : 14 - 22
[23] Face2Speech: Towards Multi-Speaker Text-to-Speech Synthesis Using an Embedding Vector Predicted from a Face Image
Goto, Shunsuke
Onishi, Kotaro
Saito, Yuki
Tachibana, Kentaro
Mori, Koichiro
INTERSPEECH 2020, 2020, : 1321 - 1325
[24] AUDIO2FACE: GENERATING SPEECH/FACE ANIMATION FROM SINGLE AUDIO WITH ATTENTION-BASED BIDIRECTIONAL LSTM NETWORKS
Tian, Guanzhong
Yuan, Yi
Liu, Yong
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 366 - 371

← 1 2 3 →