Towards Better Domain Adaptation for Self-Supervised Models: A Case Study of Child ASR

被引:6
|
作者
Fan, Ruchao [1 ]
Zhu, Yunzheng [1 ]
Wang, Jinhan [1 ]
Alwan, Abeer [2 ]
机构
[1] Univ Calif Los Angeles, ECE Dept, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Elect & Comp Engn, Los Angeles, CA 90095 USA
关键词
Adaptation models; Task analysis; Data models; Transformers; Speech recognition; Predictive coding; Computational modeling; Self-supervised learning; end-to-end speech recognition; children's ASR; domain adaptation; residual adapters; REPRESENTATION; SPEECH;
D O I
10.1109/JSTSP.2022.3200910
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, self-supervised learning (SSL) from unlabelled speech data has gained increased attention in the automatic speech recognition (ASR) community. Typical SSL methods include autoregressive predictive coding (APC), Wav2vec2.0, and hidden unit BERT (HuBERT). However, SSL models are biased to the pretraining data. When SSL models are finetuned with data from another domain, domain shifting occurs and might cause limited knowledge transfer for downstream tasks. In this paper, we propose a novel framework, domain responsible adaptation and finetuning (DRAFT), to reduce domain shifting in pretrained speech models, and evaluate it for a causal and non-causal transformer. For the causal transformer, an extension of APC (E-APC) is proposed to learn richer information from unlabelled data by using multiple temporally-shifted sequences to perform prediction. For the non-causal transformer, various solutions for using the bidirectional APC (Bi-APC) are investigated. In addition, the DRAFT framework is examined for Wav2vec2.0 and HuBERT methods, which use non-causal transformers as the backbone. The experiments are conducted on child ASR (using the OGI and MyST databases) using SSL models trained with unlabelled adult speech data from Librispeech. The relative WER improvements of up to 19.7% on the two child tasks are observed when compared to the pretrained models without adaptation. With the proposed methods (E-APC and DRAFT), the relative WER improvements are even larger (30% and 19% on the OGI and MyST data, respectively) when compared to the models without using pretraining methods.
引用
收藏
页码:1242 / 1252
页数:11
相关论文
共 50 条
  • [1] Self-Supervised Domain Adaptation with Consistency Training
    Xiao, Liang
    Xu, Jiaolong
    Zhao, Dawei
    Wang, Zhiyu
    Wang, Li
    Nie, Yiming
    Dai, Bin
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 6874 - 6880
  • [2] A framework for self-supervised federated domain adaptation
    Bin Wang
    Gang Li
    Chao Wu
    WeiShan Zhang
    Jiehan Zhou
    Ye Wei
    [J]. EURASIP Journal on Wireless Communications and Networking, 2022
  • [3] SELF-SUPERVISED DOMAIN ADAPTATION IN CROWD COUNTING
    Nguyen, Pha
    Truong, Thanh-Dat
    Huang, Miaoqing
    Liang, Yi
    Le, Ngan
    Luu, Khoa
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2786 - 2790
  • [4] A framework for self-supervised federated domain adaptation
    Wang, Bin
    Li, Gang
    Wu, Chao
    Zhang, WeiShan
    Zhou, Jiehan
    Wei, Ye
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
  • [5] Self-Supervised Learning for Domain Adaptation on Point Clouds
    Achituve, Idan
    Maron, Haggai
    Chechik, Gal
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 123 - 133
  • [6] Self-Supervised Domain Adaptation for Computer Vision Tasks
    Xu, Jiaolong
    Xiao, Liang
    Lopez, Antonio M.
    [J]. IEEE ACCESS, 2019, 7 : 156694 - 156706
  • [7] LLEDA-Lifelong Self-Supervised Domain Adaptation
    Thota, Mamatha
    Yi, Dewei
    Leontidis, Georgios
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 279
  • [8] Reinforced Reweighting for Self-Supervised Partial Domain Adaptation
    Wu, Keyu
    Chen, Shengkai
    Wu, Min
    Xiang, Shili
    Jin, Ruibing
    Xu, Yuecong
    Li, Xiaoli
    Chen, Zhenghua
    [J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (09): : 4813 - 4822
  • [9] Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition
    Hu, Shujie
    Xie, Xurong
    Geng, Mengzhe
    Jin, Zengrui
    Deng, Jiajun
    Li, Guinan
    Wang, Yi
    Cui, Mingyu
    Wang, Tianzi
    Meng, Helen
    Liu, Xunying
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3561 - 3575
  • [10] Self-supervised domain adaptation for cross-domain fault diagnosis
    Lu, Weikai
    Fan, Haoyi
    Zeng, Kun
    Li, Zuoyong
    Chen, Jian
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 10903 - 10923