Consistency self-supervised learning method for robust automatic speech recognition

被引:0
|
作者
Gao, Changfeng [1 ,2 ]
Cheng, Gaofeng [1 ,2 ]
Zhang, Pengyuan [1 ,2 ]
机构
[1] Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing,100190, China
[2] University of Chinese Academy of Sciences, Beijing,100049, China
来源
Shengxue Xuebao/Acta Acustica | 2023年 / 48卷 / 03期
关键词
Acoustic environment - Automatic speech recognition - Far-field - Pre-training - Recognition methods - Robust speech recognition - Self-supervised learning - Simulated speech - Speech recognition performance - Supervised learning methods;
D O I
暂无
中图分类号
学科分类号
摘要
引用
收藏
页码:578 / 587
相关论文
共 50 条
  • [1] MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION
    Ravanelli, Mirco
    Zhong, Jianyuan
    Pascual, Santiago
    Swietojanski, Pawel
    Monteiro, Joao
    Trmal, Jan
    Bengio, Yoshua
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6989 - 6993
  • [2] A NOISE-ROBUST SELF-SUPERVISED PRE-TRAINING MODEL BASED SPEECH REPRESENTATION LEARNING FOR AUTOMATIC SPEECH RECOGNITION
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3174 - 3178
  • [3] Robust Self-Supervised Audio-Visual Speech Recognition
    Shi, Bowen
    Hsu, Wei-Ning
    Mohamed, Abdelrahman
    [J]. INTERSPEECH 2022, 2022, : 2118 - 2122
  • [4] A Joint Speech Enhancement and Self-Supervised Representation Learning Framework for Noise-Robust Speech Recognition
    Zhu, Qiu-Shi
    Zhang, Jie
    Zhang, Zi-Qiang
    Dai, Li-Rong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1927 - 1939
  • [5] EFFICIENT ADAPTER TRANSFER OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION
    Thomas, Bethan
    Kessler, Samuel
    Karout, Salah
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7102 - 7106
  • [6] Automatic self-supervised learning of associations between speech and text
    Knuuttila, Juho
    Rasanen, Okko
    Laine, Unto K.
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 465 - 469
  • [7] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Bi, Shuai
    Hu, Zhengping
    Zhao, Mengyao
    Li, Shufang
    Sun, Zhe
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1485 - 1492
  • [8] Self-Supervised EEG Representation Learning for Robust Emotion Recognition
    Liu, Huan
    Zhang, Yuzhe
    Chen, Xuxu
    Zhang, Dalin
    Li, Rui
    Qin, Tao
    [J]. ACM Transactions on Sensor Networks, 2024, 20 (05)
  • [9] Spatiotemporal consistency enhancement self-supervised representation learning for action recognition
    Shuai Bi
    Zhengping Hu
    Mengyao Zhao
    Shufang Li
    Zhe Sun
    [J]. Signal, Image and Video Processing, 2023, 17 : 1485 - 1492
  • [10] Barlow Twins self-supervised learning for robust speaker recognition
    Mohammadamini, Mohammad
    Matrouf, Driss
    Bonastre, Jean-Francois
    Dowerah, Sandipana
    Serizel, Romain
    Jouvet, Denis
    [J]. INTERSPEECH 2022, 2022, : 4033 - 4037