Modeling speech recognition and synthesis simultaneously: Encoding and decoding lexical and sublexical semantic information into speech with no direct access to speech data

被引:0
|
作者
Begus, Gasper [1 ]
Zhou, Alan [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
来源
关键词
REPRESENTATIONS; GENERATION;
D O I
10.21437/Interspeech.2022-11219
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Human speakers encode information into raw speech which is then decoded by the listeners. This complex relationship between encoding (production) and decoding (perception) is often modeled separately. Here, we test how encoding and decoding of lexical semantic information can emerge automatically from raw speech in unsupervised generative deep convolutional networks that combine the production and perception principles of speech. We introduce, to our knowledge, the most challenging objective in unsupervised lexical learning: a network that must learn unique representations for lexical items with no direct access to training data. We train several models (ciwGAN and fiwGAN [1]) and test how the networks classify acoustic lexical items in unobserved test data. Strong evidence in favor of lexical learning and a causal relationship between latent codes and meaningful sublexical units emerge. The architecture that combines the production and perception principles is thus able to learn to decode unique information from raw acoustic data without accessing real training data directly. We propose a technique to explore lexical (holistic) and sublexical (featural) learned representations in the classifier network. The results bear implications for unsupervised speech technology, as well as for unsupervised semantic modeling as language models increasingly bypass text and operate from raw acoustics.
引用
收藏
页码:5298 / 5302
页数:5
相关论文
共 50 条
  • [1] Joint Decoding for Speech Recognition and Semantic Tagging
    Deoras, Anoop
    Sarikaya, Ruhi
    Tur, Gokhan
    Hakkani-Tuer, Dilek
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1066 - 1069
  • [2] Lexical modeling of non-native speech for automatic speech recognition
    Livescu, K
    Glass, J
    [J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1683 - 1686
  • [3] LEXICAL ACCESS TO LARGE VOCABULARIES FOR SPEECH RECOGNITION
    FISSORE, L
    LAFACE, P
    MICCA, G
    PIERACCINI, R
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (08): : 1197 - 1213
  • [4] THE LEXICAL, SYNTACTIC AND SEMANTIC PROCESSING OF A SPEECH RECOGNITION SYSTEM
    RIVOIRA, S
    TORASSO, P
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1982, 16 (01): : 39 - 63
  • [5] The development of the orthographic consistency effect in speech recognition: From sublexical to lexical involvement
    Ventura, Paulo
    Morais, Jose
    Kolinsky, Regine
    [J]. COGNITION, 2007, 105 (03) : 547 - 576
  • [6] SPEECH EMOTION RECOGNITION USING SEMANTIC INFORMATION
    Tzirakis, Panagiotis
    Anh Nguyen
    Zafeiriou, Stefanos
    Schuller, Bjoern W.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6279 - 6283
  • [7] Latent semantic language modeling for speech recognition
    Bellegarda, JR
    [J]. MATHEMATICAL FOUNDATIONS OF SPEECH AND LANGUAGE PROCESSING, 2004, 138 : 73 - 103
  • [8] Modeling lexical stress in continuous speech recognition for Dutch
    van den Heuvel, H
    van Kuijk, D
    Boves, L
    [J]. SPEECH COMMUNICATION, 2003, 40 (03) : 335 - 350
  • [9] Lexical and Phonetic Modeling for Arabic Automatic Speech Recognition
    Nguyen, Long
    Ng, Tim
    Nguyen, Kham
    Zbib, Rabih
    Makhoul, John
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 708 - +
  • [10] Speech recognition for illiterate access to information and technology
    Plauche, Madelaine
    Nallasamy, Udhyakurnar
    Pal, Joyojeet
    Wooters, Chuck
    Ramachandran, Divya
    [J]. 2006 International Conference on Information and Communication Technologies and Development, 2006, : 83 - 92