Towards Disentangled Speech Representations

被引:1
|
作者
Peyser, Cal [1 ,2 ]
Huang, Ronny [2 ]
Rosenberg, Andrew [2 ]
Sainath, Tara N. [2 ]
Picheny, Michael [1 ]
Cho, Kyunghyun [1 ]
机构
[1] NYU, Ctr Data Sci, New York, NY 10011 USA
[2] Google Inc, Mountain View, CA 94043 USA
来源
关键词
D O I
10.21437/Interspeech.2022-30
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The careful construction of audio representations has become a dominant feature in the design of approaches to many speech tasks. Increasingly, such approaches have emphasized "disentanglement", where a representation contains only parts of the speech signal relevant to transcription while discarding irrelevant information. In this paper, we construct a representation learning task based on joint modeling of ASR and TTS, and seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not. We present empirical evidence that successfully finding such a representation is tied to the randomness inherent in training. We then make the observation that these desired, disentangled solutions to the optimization problem possess unique statistical properties. Finally, we show that enforcing these properties during training improves WER by 24.5% relative on average for our joint modeling task. These observations motivate a novel approach to learning effective audio representations.
引用
收藏
页码:3603 / 3607
页数:5
相关论文
共 50 条
  • [1] Protecting gender and identity with disentangled speech representations
    Stoidis, Dimitrios
    Cavallaro, Andrea
    INTERSPEECH 2021, 2021, : 1699 - 1703
  • [2] Towards Learning Disentangled Representations for Time Series
    Li, Yuening
    Chen, Zhengzhang
    Zha, Daochen
    Du, Mengnan
    Ni, Jingchao
    Zhang, Denghui
    Chen, Haifeng
    Hu, Xia
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3270 - 3278
  • [3] Towards a Unified Framework of Contrastive Learning for Disentangled Representations
    Matthes, Stefan
    Han, Zhiwei
    Shen, Hao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
    Hsu, Wei-Ning
    Tang, Hao
    Glass, James
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1576 - 1580
  • [5] LEARNING DISENTANGLED FEATURE REPRESENTATIONS FOR SPEECH ENHANCEMENT VIA ADVERSARIAL TRAINING
    Hou, Nana
    Xu, Chenglin
    Chng, Eng Siong
    Li, Haizhou
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 666 - 670
  • [6] Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
    Polyak, Adam
    Adi, Yossi
    Copet, Jade
    Kharitonov, Eugene
    Lakhotia, Kushal
    Hsu, Wei-Ning
    Mohamed, Abdelrahman
    Dupoux, Emmanuel
    INTERSPEECH 2021, 2021, : 3615 - 3619
  • [7] On the Fairness of Disentangled Representations
    Locatello, Francesco
    Abbati, Gabriele
    Rainforth, Tom
    Bauer, Stefan
    Scholkopf, Bernhard
    Bachem, Olivier
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] Disentangled representations: towards interpretation of sex determination from hip bone
    Zou, Kaifeng
    Faisan, Sylvain
    Heitz, Fabrice
    Epain, Marie
    Croisille, Pierre
    Fanton, Laurent
    Valette, Sebastien
    VISUAL COMPUTER, 2023, 39 (12): : 6673 - 6687
  • [9] Disentangled representations: towards interpretation of sex determination from hip bone
    Kaifeng Zou
    Sylvain Faisan
    Fabrice Heitz
    Marie Epain
    Pierre Croisille
    Laurent Fanton
    Sébastien Valette
    The Visual Computer, 2023, 39 : 6673 - 6687
  • [10] Disentangled behavioral representations
    Dezfouli, Amir
    Ashtiani, Hassan
    Ghattas, Omar
    Nock, Richard
    Dayan, Peter
    Ong, Cheng Soon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32