Towards Disentangled Speech Representations

被引:1
|
作者
Peyser, Cal [1 ,2 ]
Huang, Ronny [2 ]
Rosenberg, Andrew [2 ]
Sainath, Tara N. [2 ]
Picheny, Michael [1 ]
Cho, Kyunghyun [1 ]
机构
[1] NYU, Ctr Data Sci, New York, NY 10011 USA
[2] Google Inc, Mountain View, CA 94043 USA
来源
关键词
D O I
10.21437/Interspeech.2022-30
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The careful construction of audio representations has become a dominant feature in the design of approaches to many speech tasks. Increasingly, such approaches have emphasized "disentanglement", where a representation contains only parts of the speech signal relevant to transcription while discarding irrelevant information. In this paper, we construct a representation learning task based on joint modeling of ASR and TTS, and seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not. We present empirical evidence that successfully finding such a representation is tied to the randomness inherent in training. We then make the observation that these desired, disentangled solutions to the optimization problem possess unique statistical properties. Finally, we show that enforcing these properties during training improves WER by 24.5% relative on average for our joint modeling task. These observations motivate a novel approach to learning effective audio representations.
引用
收藏
页码:3603 / 3607
页数:5
相关论文
共 50 条
  • [21] CONTRASTIVE PREDICTIVE CODING SUPPORTED FACTORIZED VARIATIONAL AUTOENCODER FOR UNSUPERVISED LEARNING OF DISENTANGLED SPEECH REPRESENTATIONS
    Ebbers, Janek
    Kuhlmann, Michael
    Cord-Landwehr, Tobias
    Haeb-Umbach, Reinhold
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3860 - 3864
  • [22] Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations
    Oh, Yoori
    Lee, Juheon
    Han, Yoseob
    Lee, Kyogu
    INTERSPEECH 2023, 2023, : 4818 - 4822
  • [23] Domain Agnostic Learning with Disentangled Representations
    Peng, Xingchao
    Huang, Zijun
    Sun, Ximeng
    Saenko, Kate
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [24] Learning Disentangled Representations of Negation and Uncertainty
    Vasilakes, Jake
    Zerva, Chrysoula
    Miwa, Makoto
    Ananiadou, Sophia
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 8380 - 8397
  • [25] Deep Disentangled Representations for Volumetric Reconstruction
    Grant, Edward
    Kohli, Pushmeet
    van Gerven, Marcel
    COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 266 - 279
  • [26] A Contrastive Objective for Learning Disentangled Representations
    Kahana, Jonathan
    Hoshen, Yedid
    COMPUTER VISION, ECCV 2022, PT XXVI, 2022, 13686 : 579 - 595
  • [27] An Identifiable Double VAE For Disentangled Representations
    Mita, Graziano
    Filippone, Maurizio
    Michiardi, Pietro
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Learning disentangled representations in the imaging domain
    Liu, Xiao
    Sanchez, Pedro
    Thermos, Spyridon
    O'Neil, Alison Q.
    Tsaftaris, Sotirios A.
    MEDICAL IMAGE ANALYSIS, 2022, 80
  • [29] Adversarial Robustness through Disentangled Representations
    Yang, Shuo
    Guo, Tianyu
    Wang, Yunhe
    Xu, Chang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 3145 - 3153
  • [30] STOCHASTIC VIDEO GENERATION WITH DISENTANGLED REPRESENTATIONS
    Li, Maomao
    Yuan, Chun
    Lin, Zhihui
    Zheng, Zhuobin
    Cheng, Yangyang
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 224 - 229