Unsupervised Methods for Evaluating Speech Representations

被引:0
|
作者
Gump, Michael [1 ]
Hsu, Wei-Ning [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
关键词
speech representation learning; unsupervised learning;
D O I
10.21437/Interspeech.2020-2990
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Disentanglement is a desired property in representation learning and a significant body of research has tried to show that it is a useful representational prior. Evaluating disentanglement is challenging, particularly for real world data like speech, where ground truth generative factors are typically not available. Previous work on disentangled representation learning in speech has used categorical supervision like phoneme or speaker identity in order to disentangle grouped feature spaces. However, this work differs from the typical dimension-wise view of disentanglement in other domains. This paper proposes to use low-level acoustic features to provide the structure required to evaluate dimension-wise disentanglement. By choosing well-studied acoustic features, grounded and descriptive evaluation is made possible for unsupervised representation learning. This work produces a toolkit for evaluating disentanglement in unsupervised representations of speech and evaluates its efficacy on previous research.
引用
收藏
页码:170 / 174
页数:5
相关论文
共 50 条
  • [1] Methods for evaluating unsupervised vector representations of genomic regions
    Zheng, Guangtao
    Rymuza, Julia
    Gharavi, Erfaneh
    Leroy, Nathan J.
    Zhang, Aidong
    Sheffield, Nathan C.
    NAR GENOMICS AND BIOINFORMATICS, 2024, 6 (03)
  • [2] Disentangling Prosody Representations With Unsupervised Speech Reconstruction
    Qu L.
    Li T.
    Weber C.
    Pekarek-Rosin T.
    Ren F.
    Wermter S.
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 39 - 54
  • [3] Evaluating unsupervised data in isolated speech recognizer
    Seman, Noraini
    Salleh, Siti Salwa
    Hussin, Naimah Mohd
    2008 INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING, VOLS 1-3, 2008, : 439 - 444
  • [4] Spatial position constraint for unsupervised learning of speech representations
    Humayun M.A.
    Yassin H.
    Abas P.E.
    PeerJ Computer Science, 2021, 7 : 1 - 24
  • [5] Spatial position constraint for unsupervised learning of speech representations
    Humayun, Mohammad Ali
    Yassin, Hayati
    Abas, Pg Emeroylariffion
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [6] Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech
    Lee, Yun Kyung
    Park, Jeon Gue
    APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [7] Unsupervised low-rank representations for speech emotion recognition
    Paraskevopoulos, Georgios
    Tzinis, Efthymios
    Ellinas, Nikolaos
    Giannakopoulos, Theodoros
    Potamianos, Alexandros
    INTERSPEECH 2019, 2019, : 939 - 943
  • [8] Probing phoneme, language and speaker information in unsupervised speech representations
    de Seyssel, Maureen
    Lavechin, Marvin
    Adi, Yossi
    Dupoux, Emmanuel
    Wisniewski, Guillaume
    INTERSPEECH 2022, 2022, : 1402 - 1406
  • [9] Learning word embeddings: unsupervised methods for fixed-size representations of variable-length speech segments
    Holzenberger, Nils
    Du, Mingxing
    Karadayi, Julien
    Riad, Rachid
    Dupoux, Emmanuel
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2683 - 2687
  • [10] Evaluating Self-Supervised Speech Representations for Speech Emotion Recognition
    Atmaja, Bagus Tris
    Sasou, Akira
    IEEE ACCESS, 2022, 10 : 124396 - 124407