Spatial position constraint for unsupervised learning of speech representations

被引:2
|
作者
Humayun, Mohammad Ali [1 ]
Yassin, Hayati [1 ]
Abas, Pg Emeroylariffion [1 ]
机构
[1] Univ Brunei Darussalam, Fac Integrated Technol, Jalan Tungku Link, Gadong, Brunei
关键词
Low resource speech; Representation learning; Multitasking; Geometric constraint;
D O I
10.7717/peerj-cs.650
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Learning Transferrable Representations for Unsupervised Domain Adaptation
    Sener, Ozan
    Song, Hyun Oh
    Saxena, Ashutosh
    Savarese, Silvio
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [32] Unsupervised Learning of Disentangled Representations from Video
    Denton, Emily
    Birodkar, Vighnesh
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [33] Learning robust and multilingual speech representations
    Kawakami, Kazuya
    Wang, Luyu
    Dyer, Chris
    Blunsom, Phil
    van den Oord, Aaron
    arXiv, 2020,
  • [34] Learning Robust and Multilingual Speech Representations
    Kawakami, Kazuya
    Wang, Luyu
    Dyer, Chris
    Blunsom, Phil
    van den Oord, Aaron
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1182 - 1192
  • [35] Learning word embeddings: unsupervised methods for fixed-size representations of variable-length speech segments
    Holzenberger, Nils
    Du, Mingxing
    Karadayi, Julien
    Riad, Rachid
    Dupoux, Emmanuel
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2683 - 2687
  • [36] UNSUPERVISED REPRESENTATION LEARNING OF SPEECH FOR DIALECT IDENTIFICATION
    Shon, Suwon
    Hsu, Wei-Ning
    Glass, James
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 105 - 111
  • [37] An Unsupervised Autoregressive Model for Speech Representation Learning
    Chung, Yu-An
    Hsu, Wei-Ning
    Tang, Hao
    Glass, James
    INTERSPEECH 2019, 2019, : 146 - 150
  • [38] Speech emotion recognition with unsupervised feature learning
    Zheng-wei Huang
    Wen-tao Xue
    Qi-rong Mao
    Frontiers of Information Technology & Electronic Engineering, 2015, 16 : 358 - 366
  • [39] Speech emotion recognition with unsupervised feature learning
    Huang, Zheng-wei
    Xue, Wen-tao
    Mao, Qi-rong
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2015, 16 (05) : 358 - 366
  • [40] CONTRASTIVE UNSUPERVISED LEARNING FOR SPEECH EMOTION RECOGNITION
    Li, Mao
    Yang, Bo
    Levy, Joshua
    Stolcke, Andreas
    Rozgic, Viktor
    Matsoukas, Spyros
    Papayiannis, Constantinos
    Bone, Daniel
    Wang, Chao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6329 - 6333