TRANSFORMER BASED UNSUPERVISED PRE-TRAINING FOR ACOUSTIC REPRESENTATION LEARNING

被引:15
|
作者
Zhang, Ruixiong [1 ]
Wu, Haiwei [1 ]
Li, Wubo [1 ]
Jiang, Dongwei [1 ]
Zou, Wei [1 ]
Li, Xiangang [1 ]
机构
[1] DiDi Chuxing, Beijing, Peoples R China
关键词
unsupervised pre-training; Transformer; acoustic representation learning; EMOTION;
D O I
10.1109/ICASSP39728.2021.9414996
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, a variety of acoustic tasks and related applications arised. For many acoustic tasks, the labeled data size may be limited. To handle this problem, we propose an unsupervised pre-training method using Transformer based encoder to learn a general and robust high-level representation for all acoustic tasks. Experiments have been conducted on three kinds of acoustic tasks: speech emotion recognition, sound event detection and speech translation. All the experiments have shown that pre-training using its own training data can significantly improve the performance. With a larger pre-training data combining MuST-C, Librispeech and ESC-US datasets, for speech emotion recognition, the UAR can further improve absolutely 4.3% on IEMOCAP dataset. For sound event detection, the F1 score can further improve absolutely 1.5% on DCASE2018 task5 development set and 2.1% on evaluation set. For speech translation, the BLEU score can further improve relatively 12.2% on En-De dataset and 8.4% on En-Fr dataset.
引用
收藏
页码:6933 / 6937
页数:5
相关论文
共 50 条
  • [1] RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation
    Ren, Houxing
    Wang, Jingyuan
    Zhao, Wayne Xin
    Wu, Ning
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3503 - 3511
  • [2] Improving Transformer-based Speech Recognition with Unsupervised Pre-training and Multi-task Semantic Knowledge Learning
    Li, Song
    Li, Lin
    Hong, Qingyang
    Liu, Lingling
    [J]. INTERSPEECH 2020, 2020, : 5006 - 5010
  • [3] Lottery Hypothesis based Unsupervised Pre-training for Model Compression in Federated Learning
    Itahara, Sohei
    Nishio, Takayuki
    Morikura, Masahiro
    Yamamoto, Koji
    [J]. 2020 IEEE 92ND VEHICULAR TECHNOLOGY CONFERENCE (VTC2020-FALL), 2020,
  • [4] Pre-training Strategies and Datasets for Facial Representation Learning
    Bulat, Adrian
    Cheng, Shiyang
    Yang, Jing
    Garbett, Andrew
    Sanchez, Enrique
    Tzimiropoulos, Georgios
    [J]. COMPUTER VISION, ECCV 2022, PT XIII, 2022, 13673 : 107 - 125
  • [5] Why Does Unsupervised Pre-training Help Deep Learning?
    Erhan, Dumitru
    Bengio, Yoshua
    Courville, Aaron
    Manzagol, Pierre-Antoine
    Vincent, Pascal
    Bengio, Samy
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 625 - 660
  • [6] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [7] Unsupervised Pre-Training for Voice Activation
    Kolesau, Aliaksei
    Sesok, Dmitrij
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (23): : 1 - 13
  • [8] A Study of Speech Recognition for Kazakh Based on Unsupervised Pre-Training
    Meng, Weijing
    Yolwas, Nurmemet
    [J]. SENSORS, 2023, 23 (02)
  • [9] Multilingual Molecular Representation Learning via Contrastive Pre-training
    Guo, Zhihui
    Sharma, Pramod
    Martinez, Andy
    Du, Liang
    Abraham, Robin
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3441 - 3453
  • [10] RePreM: Representation Pre-training with Masked Model for Reinforcement Learning
    Cai, Yuanying
    Zhang, Chuheng
    Shen, Wei
    Zhang, Xuyun
    Ruan, Wenjie
    Huang, Longbo
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6879 - 6887