Towards Disentangled Speech Representations

被引:1
|
作者
Peyser, Cal [1 ,2 ]
Huang, Ronny [2 ]
Rosenberg, Andrew [2 ]
Sainath, Tara N. [2 ]
Picheny, Michael [1 ]
Cho, Kyunghyun [1 ]
机构
[1] NYU, Ctr Data Sci, New York, NY 10011 USA
[2] Google Inc, Mountain View, CA 94043 USA
来源
关键词
D O I
10.21437/Interspeech.2022-30
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The careful construction of audio representations has become a dominant feature in the design of approaches to many speech tasks. Increasingly, such approaches have emphasized "disentanglement", where a representation contains only parts of the speech signal relevant to transcription while discarding irrelevant information. In this paper, we construct a representation learning task based on joint modeling of ASR and TTS, and seek to learn a representation of audio that disentangles that part of the speech signal that is relevant to transcription from that part which is not. We present empirical evidence that successfully finding such a representation is tied to the randomness inherent in training. We then make the observation that these desired, disentangled solutions to the optimization problem possess unique statistical properties. Finally, we show that enforcing these properties during training improves WER by 24.5% relative on average for our joint modeling task. These observations motivate a novel approach to learning effective audio representations.
引用
收藏
页码:3603 / 3607
页数:5
相关论文
共 50 条
  • [41] Linear Disentangled Representations and Unsupervised Action Estimation
    Painter, Matthew
    Hare, Jonathon
    Prugel-Bennett, Adam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [42] Learning Disentangled Representations of Video with Missing Data
    Massague, Armand Comas
    Zhang, Chi
    Feric, Zlatan
    Camps, Octavia
    Yu, Rose
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [43] On Disentangled Representations Learned from Correlated Data
    Traeuble, Frederik
    Creager, Elliot
    Kilbertus, Niki
    Locatello, Francesco
    Dittadi, Andrea
    Goyal, Anirudh
    Schoelkopf, Bernhard
    Bauer, Stefan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7412 - 7422
  • [44] Handwritten Text Generation via Disentangled Representations
    Liu, Xiyan
    Meng, Gaofeng
    Xiang, Shiming
    Pan, Chunhong
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1838 - 1842
  • [45] LEARNING DISENTANGLED FEATURE REPRESENTATIONS FOR ANOMALY DETECTION
    Lee, Wei-Yu
    Wang, Yu-Chiang Frank
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2156 - 2160
  • [46] Learning Disentangled Representations via Independent Subspaces
    Awiszus, Maren
    Ackermann, Hanno
    Rosenhahn, Bodo
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 560 - 568
  • [47] Are Disentangled Representations Helpful for Abstract Visual Reasoning?
    van Steenkiste, Sjoerd
    Locatello, Francesco
    Schmidhuber, Juergen
    Bachem, Olivier
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [48] Image deraining via invertible disentangled representations
    Chen, Xueling
    Zhou, Xuan
    Sun, Wei
    Zhang, Yanning
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [49] Unsupervised Learning of Disentangled Representations from Video
    Denton, Emily
    Birodkar, Vighnesh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [50] Where and What? Examining Interpretable Disentangled Representations
    Zhu, Xinqi
    Xu, Chang
    Tao, Dacheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5857 - 5866