Spatial position constraint for unsupervised learning of speech representations

被引:2
|
作者
Humayun, Mohammad Ali [1 ]
Yassin, Hayati [1 ]
Abas, Pg Emeroylariffion [1 ]
机构
[1] Univ Brunei Darussalam, Fac Integrated Technol, Jalan Tungku Link, Gadong, Brunei
关键词
Low resource speech; Representation learning; Multitasking; Geometric constraint;
D O I
10.7717/peerj-cs.650
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] Probing phoneme, language and speaker information in unsupervised speech representations
    de Seyssel, Maureen
    Lavechin, Marvin
    Adi, Yossi
    Dupoux, Emmanuel
    Wisniewski, Guillaume
    INTERSPEECH 2022, 2022, : 1402 - 1406
  • [22] Learning Semantic Representations for Unsupervised Domain Adaptation
    Xie, Shaoan
    Zheng, Zibin
    Chen, Liang
    Chen, Chuan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [23] Unsupervised Learning of Visual Representations using Videos
    Wang, Xiaolong
    Gupta, Abhinav
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2794 - 2802
  • [24] Synchronized Conceptual Representations in Unsupervised Generative Learning
    Dolgikh, Serge
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2021), 2022, 417 : 23 - 32
  • [25] Unsupervised Learning of Video Representations using LSTMs
    Srivastava, Nitish
    Mansimov, Elman
    Salakhutdinov, Ruslan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 843 - 852
  • [26] SURRL: Structural Unsupervised Representations for Robot Learning
    Zhang, Fengyi
    Chen, Yurou
    Qiao, Hong
    Liu, Zhiyong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (02) : 819 - 831
  • [27] CURL: Contrastive Unsupervised Representations for Reinforcement Learning
    Laskin, Michael
    Srinivas, Aravind
    Abbeel, Pieter
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [28] Debiased Contrastive Learning of Unsupervised Sentence Representations
    Zhou, Kun
    Zhang, Beichen
    Zhao, Wayne Xin
    Wen, Ji-Rong
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6120 - 6130
  • [29] Unsupervised Learning of Discriminative Attributes and Visual Representations
    Huang, Chen
    Loy, Chen Change
    Tang, Xiaoou
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5175 - 5184
  • [30] UNSUPERVISED CONTRASTIVE LEARNING OF SOUND EVENT REPRESENTATIONS
    Fonseca, Eduardo
    Ortego, Diego
    McGuinness, Kevin
    O'Connor, Noel E.
    Serra, Xavier
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 371 - 375