Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data

被引：0

作者：

Begus, Gasper ^{[1
]}

Zhou, Alan ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

INTERSPEECH 2022 | 2022年

关键词：

REPRESENTATIONS; GENERATION;

D O I：

10.21437/Interspeech.2022-11219

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding of lexical semantic information can emerge automatically from raw speech in unsupervised generative deep convolutional networks that combine the production and perception principles of speech. We introduce, to our knowledge, the most challenging objective in unsupervised lexical learning: a network that must learn unique representations for lexical items with no direct access to training data. We train several models (ciwGAN and fiwGAN [1]) and test how the networks classify acoustic lexical items in unobserved test data. Strong evidence in favor of lexical learning and a causal relationship between latent codes and meaningful sublexical units emerge. The architecture that combines the production and perception principles is thus able to learn to decode unique information from raw acoustic data without accessing real training data directly. We propose a technique to explore lexical (holistic) and sublexical (featural) learned representations in the classifier network. The results bear implications for unsupervised speech technology, as well as for unsupervised semantic modeling as language models increasingly bypass text and operate from raw acoustics.

引用

页码：5298 / 5302

页数：5

共 50 条

[21] Is speech recognition automatic? Lexical competition, but not initial lexical access, requires cognitive resources
Zhang, Xujin
Samuel, Arthur G.
[J]. MUTATION RESEARCH-REVIEWS IN MUTATION RESEARCH, 2018, 775 : 32 - 50
[22] The influence of speech rate and accent on access and use of semantic information
Sajin, Stanislav M.
Connine, Cynthia M.
[J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2017, 70 (04): : 619 - 636
[23] DIRECT SAMPLE INTERPOLATION (DSI) SPEECH SYNTHESIS - AN INTERPOLATION TECHNIQUE FOR DIGITAL SPEECH DATA-COMPRESSION AND SPEECH SYNTHESIS
BEDDOES, MP
CHU, TK
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1982, 30 (06): : 825 - 832
[24] Information-theoretic analysis of efficiency of the phonetic encoding-decoding method in automatic speech recognition
Savchenko, V. V.
Savchenko, A. V.
[J]. JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2016, 61 (04) : 430 - 435
[25] DISCRIMINATIVE LANGUAGE MODELING FOR SPEECH RECOGNITION WITH RELEVANCE INFORMATION
Chen, Berlin
Liu, Jia-Wen
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
[26] A New Bidirectional Neural Network for Lexical Modeling and Speech Recognition Improvement
Yazdchi, M. R.
Salehi, S. A. Seyyed
Zafarani, R.
[J]. SCIENTIA IRANICA, 2007, 14 (06) : 571 - 578
[27] Exploiting speech production information for automatic speech and speaker modeling and recognition - possibilities and new opportunities
Ramanarayanan, Vikram
Ghosh, Prasanta Kumar
Lammert, Adam
Narayanan, Shrikanth S.
[J]. 2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[28] Speech recognition and direct data entry in clinical microbiology
OHara, SP
Athersuch, R
[J]. BRITISH JOURNAL OF BIOMEDICAL SCIENCE, 1996, 53 (03) : 209 - 213
[29] Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise
Carroll, Rebecca
Warzybok, Anna
Kollmeier, Birger
Ruigendijk, Esther
[J]. FRONTIERS IN PSYCHOLOGY, 2016, 7
[30] DEEPTALK: VOCAL STYLE ENCODING FOR SPEAKER RECOGNITION AND SPEECH SYNTHESIS
Chowdhury, Anurag
Ross, Arun
David, Prabu
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6189 - 6193

← 1 2 3 4 5 →