Sampling strategies in Siamese Networks for unsupervised speech representation learning

被引:10
|
作者
Riad, Rachid [1 ,2 ]
Dancette, Corentin [1 ]
Karadayi, Julien [1 ]
Zeghidour, Neil [1 ,3 ]
Schatz, Thomas [1 ,4 ,5 ,6 ]
Dupoux, Emmanuel [1 ,3 ]
机构
[1] PSL Res Univ, INRIA, EHESS, CoML,ENS,CNRS, Paris, France
[2] UPEC, INSERM, ENS, NPI, Creteil, France
[3] Facebook AI Res, Paris, France
[4] Univ Maryland, Dept Linguist, College Pk, MD 20742 USA
[5] Univ Maryland, UMIACS, College Pk, MD 20742 USA
[6] MIT, Dept Linguist, 77 Massachusetts Ave, Cambridge, MA 02139 USA
基金
欧洲研究理事会;
关键词
language acquisition; speech recognition; sampling; Zipf's law; weakly supervised learning; unsupervised learning; Siamese network; speech embeddings; ABX; zero resource speech technology;
D O I
10.21437/Interspeech.2018-2384
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies have investigated siamese network architectures for learning invariant speech representations using same different side information at the word level. Here we investigate systematically an often ignored component of siamese networks: the sampling procedure (how pairs of same vs. different tokens are selected). We show that sampling strategies taking into account Zipf's Law, the distribution of speakers and the proportions of same and different pairs of words significantly impact the performance of the network. In particular, we show that word frequency compression improves learning across a large range of variations in number of training pairs. This effect does not apply to the same extent to the fully unsupervised setting, where the pairs of same-different words are obtained by spoken term discovery. We apply these results to pairs of words discovered using an unsupervised algorithm and show an improvement on state-of-the-art in unsupervised representation learning using siamese networks.
引用
收藏
页码:2658 / 2662
页数:5
相关论文
共 50 条
  • [1] Unsupervised Feature Learning for Speech Using Correspondence and Siamese Networks
    Last, Petri-Johan
    Engelbrecht, Herman A.
    Kamper, Herman
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 421 - 425
  • [2] Microstructure representation learning using Siamese networks
    Avadhut Sardeshmukh
    Sreedhar Reddy
    B. P. Gautham
    Pushpak Bhattacharyya
    [J]. MRS Communications, 2020, 10 : 613 - 619
  • [3] Microstructure representation learning using Siamese networks
    Sardeshmukh, Avadhut
    Reddy, Sreedhar
    Gautham, B. P.
    Bhattacharyya, Pushpak
    [J]. MRS COMMUNICATIONS, 2020, 10 (04) : 613 - 619
  • [4] UNSUPERVISED REPRESENTATION LEARNING OF SPEECH FOR DIALECT IDENTIFICATION
    Shon, Suwon
    Hsu, Wei-Ning
    Glass, James
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 105 - 111
  • [5] An Unsupervised Autoregressive Model for Speech Representation Learning
    Chung, Yu-An
    Hsu, Wei-Ning
    Tang, Hao
    Glass, James
    [J]. INTERSPEECH 2019, 2019, : 146 - 150
  • [6] Unsupervised simple Siamese representation learning for blind super-resolution
    Yin, Pengfei
    Liu, Zhonghua
    Wu, Di
    Huo, Hua
    Wang, Haijun
    Zhang, Kaibing
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 114
  • [7] Unsupervised speech representation learning for behavior modeling using triplet enhanced contextualized networks
    Li, Haoqi
    Baucom, Brian
    Narayanan, Shrikanth
    Georgiou, Panayiotis
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 70
  • [8] Unsupervised Path Representation Learning with Curriculum Negative Sampling
    Bin Yang, Sean
    Guo, Chenjuan
    Hu, Jilin
    Tang, Jian
    Yang, Bin
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3286 - 3292
  • [9] Unsupervised Speech Representation Learning Using WaveNet Autoencoders
    Chorowski, Jan
    Weiss, Ron J.
    Bengio, Samy
    van den Oord, Aaron
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (12) : 2041 - 2053
  • [10] Unsupervised Learning of Disentangled Speech Content and Style Representation
    Tjandra, Andros
    Pang, Ruoming
    Zhang, Yu
    Karita, Shigeki
    [J]. INTERSPEECH 2021, 2021, : 4089 - 4093