Unsupervised Methods for Evaluating Speech Representations

被引:0
|
作者
Gump, Michael [1 ]
Hsu, Wei-Ning [1 ]
Glass, James [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
来源
关键词
speech representation learning; unsupervised learning;
D O I
10.21437/Interspeech.2020-2990
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Disentanglement is a desired property in representation learning and a significant body of research has tried to show that it is a useful representational prior. Evaluating disentanglement is challenging, particularly for real world data like speech, where ground truth generative factors are typically not available. Previous work on disentangled representation learning in speech has used categorical supervision like phoneme or speaker identity in order to disentangle grouped feature spaces. However, this work differs from the typical dimension-wise view of disentanglement in other domains. This paper proposes to use low-level acoustic features to provide the structure required to evaluate dimension-wise disentanglement. By choosing well-studied acoustic features, grounded and descriptive evaluation is made possible for unsupervised representation learning. This work produces a toolkit for evaluating disentanglement in unsupervised representations of speech and evaluates its efficacy on previous research.
引用
收藏
页码:170 / 174
页数:5
相关论文
共 50 条
  • [31] Compact speech representations for speech synthesis
    Kleijn, WB
    Talkin, D
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 35 - 38
  • [32] UnsuParse: Unsupervised Parsing with unsupervised Part of Speech tagging
    Haenig, Christian
    Bordag, Stefan
    Quasthoff, Uwe
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1109 - 1114
  • [33] Unsupervised pattern discovery in speech
    Park, Alex S.
    Glass, James R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 186 - 197
  • [34] A Theory of Unsupervised Speech Recognition
    Wang, Liming
    Hasegawa-Johnson, Mark
    Yoo, Chang D.
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1192 - 1215
  • [35] A Comparison of Neural Network Methods for Unsupervised Representation Learning on the Zero Resource Speech Challenge
    Renshaw, Daniel
    Kamper, Herman
    Jansen, Area
    Goldwater, Sharon
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3199 - 3203
  • [36] UNSUPERVISED LEARNING OF SEMANTIC AUDIO REPRESENTATIONS
    Jansen, Aren
    PlakaL, Manoj
    Pandya, Ratheet
    Ellis, Daniel P. W.
    Hershey, Shawn
    Liu, Jiayang
    Moore, R. Channing
    Saurous, Rif A.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 126 - 130
  • [37] Learning Unsupervised Representations for ICU Timeseries
    Weatherhead, Addison
    Greer, Robert
    Moga, Michael-Alice
    Mazwi, Mjaye
    Eytan, Danny
    Goldenberg, Anna
    Tonekaboni, Sana
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 152 - 168
  • [38] Geometry Representations with Unsupervised Feature Learning
    Yoon, Yeo-Jin
    Lelidis, Alexander
    Oeztireli, A. Cengiz
    Hwang, Jung-Min
    Gross, Markus
    Choi, Soo-Mi
    2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 137 - 142
  • [39] A STUDY OF RANK-ORDER METHODS OF EVALUATING PERFORMANCES IN SPEECH CONTESTS
    Knower, Franklin H.
    JOURNAL OF APPLIED PSYCHOLOGY, 1940, 24 (05) : 633 - 644
  • [40] UNSUPERVISED LEARNING FOR FORECASTING ACTION REPRESENTATIONS
    Zhong, Yi
    Zheng, Wei-Shi
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 1073 - 1077