Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data

被引：0

作者：

Begus, Gasper ^{[1
]}

Zhou, Alan ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

REPRESENTATIONS; GENERATION;

D O I：

10.21437/Interspeech.2022-11219

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding of lexical semantic information can emerge automatically from raw speech in unsupervised generative deep convolutional networks that combine the production and perception principles of speech. We introduce, to our knowledge, the most challenging objective in unsupervised lexical learning: a network that must learn unique representations for lexical items with no direct access to training data. We train several models (ciwGAN and fiwGAN [1]) and test how the networks classify acoustic lexical items in unobserved test data. Strong evidence in favor of lexical learning and a causal relationship between latent codes and meaningful sublexical units emerge. The architecture that combines the production and perception principles is thus able to learn to decode unique information from raw acoustic data without accessing real training data directly. We propose a technique to explore lexical (holistic) and sublexical (featural) learned representations in the classifier network. The results bear implications for unsupervised speech technology, as well as for unsupervised semantic modeling as language models increasingly bypass text and operate from raw acoustics.

引用

页码：5298 / 5302

页数：5

共 50 条

[1] Joint Decoding for Speech Recognition and Semantic Tagging
Deoras, Anoop
Sarikaya, Ruhi
Tur, Gokhan
Hakkani-Tuer, Dilek
[J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1066 - 1069
[2] Lexical modeling of non-native speech for automatic speech recognition
Livescu, K
Glass, J
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1683 - 1686
[3] LEXICAL ACCESS TO LARGE VOCABULARIES FOR SPEECH RECOGNITION
FISSORE, L
LAFACE, P
MICCA, G
PIERACCINI, R
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (08): : 1197 - 1213
[4] THE LEXICAL, SYNTACTIC AND SEMANTIC PROCESSING OF A SPEECH RECOGNITION SYSTEM
RIVOIRA, S
TORASSO, P
[J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1982, 16 (01): : 39 - 63
[5] The development of the orthographic consistency effect in speech recognition: From sublexical to lexical involvement
Ventura, Paulo
Morais, Jose
Kolinsky, Regine
[J]. COGNITION, 2007, 105 (03) : 547 - 576
[6] SPEECH EMOTION RECOGNITION USING SEMANTIC INFORMATION
Tzirakis, Panagiotis
Anh Nguyen
Zafeiriou, Stefanos
Schuller, Bjoern W.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6279 - 6283
[7] Latent semantic language modeling for speech recognition
Bellegarda, JR
[J]. MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 73 - 103
[8] Modeling lexical stress in continuous speech recognition for Dutch
van den Heuvel, H
van Kuijk, D
Boves, L
[J]. SPEECH COMMUNICATION, 2003, 40 (03) : 335 - 350
[9] Lexical and Phonetic Modeling for Arabic Automatic Speech Recognition
Nguyen, Long
Ng, Tim
Nguyen, Kham
Zbib, Rabih
Makhoul, John
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 708 - +
[10] Speech recognition for illiterate access to information and technology
Plauche, Madelaine
Nallasamy, Udhyakurnar
Pal, Joyojeet
Wooters, Chuck
Ramachandran, Divya
[J]. 2006 International Conference on Information and Communication Technologies and Development, 2006, : 83 - 92

← 1 2 3 4 5 →