Spatial position constraint for unsupervised learning of speech representations

被引:2
|
作者
Humayun, Mohammad Ali [1 ]
Yassin, Hayati [1 ]
Abas, Pg Emeroylariffion [1 ]
机构
[1] Univ Brunei Darussalam, Fac Integrated Technol, Jalan Tungku Link, Gadong, Brunei
关键词
Low resource speech; Representation learning; Multitasking; Geometric constraint;
D O I
10.7717/peerj-cs.650
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Spatial position constraint for unsupervised learning of speech representations
    Humayun M.A.
    Yassin H.
    Abas P.E.
    PeerJ Computer Science, 2021, 7 : 1 - 24
  • [2] Unsupervised Methods for Evaluating Speech Representations
    Gump, Michael
    Hsu, Wei-Ning
    Glass, James
    INTERSPEECH 2020, 2020, : 170 - 174
  • [3] Learning Invariant Object and Spatial View Representations in the Brain Using Slow Unsupervised Learning
    Rolls, Edmund T.
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15
  • [4] Unsupervised learning of invariant representations
    Anselmi, Fabio
    Leibo, Joel Z.
    Rosasco, Lorenzo
    Mutch, Jim
    Tacchetti, Andrea
    Poggio, Tomaso
    THEORETICAL COMPUTER SCIENCE, 2016, 633 : 112 - 121
  • [5] Unsupervised Learning of Face Representations
    Datta, Samyak
    Sharma, Gaurav
    Jawahar, C. V.
    PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 135 - 142
  • [6] Models for unsupervised learning of representations
    Garionis, R
    8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 253 - 258
  • [7] Disentangling Prosody Representations With Unsupervised Speech Reconstruction
    Qu L.
    Li T.
    Weber C.
    Pekarek-Rosin T.
    Ren F.
    Wermter S.
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 39 - 54
  • [8] UNSUPERVISED LEARNING OF SEMANTIC AUDIO REPRESENTATIONS
    Jansen, Aren
    PlakaL, Manoj
    Pandya, Ratheet
    Ellis, Daniel P. W.
    Hershey, Shawn
    Liu, Jiayang
    Moore, R. Channing
    Saurous, Rif A.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 126 - 130
  • [9] Learning Unsupervised Representations for ICU Timeseries
    Weatherhead, Addison
    Greer, Robert
    Moga, Michael-Alice
    Mazwi, Mjaye
    Eytan, Danny
    Goldenberg, Anna
    Tonekaboni, Sana
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174, 2022, 174 : 152 - 168
  • [10] Geometry Representations with Unsupervised Feature Learning
    Yoon, Yeo-Jin
    Lelidis, Alexander
    Oeztireli, A. Cengiz
    Hwang, Jung-Min
    Gross, Markus
    Choi, Soo-Mi
    2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 137 - 142