Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

被引:1
|
作者
Zhan, Qingran [1 ]
Xie, Xiang [1 ,2 ]
Hu, Chenguang [1 ]
Zuluaga-Gomez, Juan [3 ,4 ,5 ]
Wang, Jing [1 ]
Cheng, Haobo [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Shenzhen Res Inst, Shenzhen 518063, Peoples R China
[3] Idiap Res Inst, CH-1920 Martigny, Switzerland
[4] Ecole Polytech Fed Lausanne EPFL, CH-1015 Lausanne, Switzerland
[5] Univ Autonoma Caribe, Dept Mechatron Engn, Barranquilla 080020, Colombia
关键词
cross-lingual automatic speech recognition (ASR); articulatory features; domain-adversarial neural network; multi-stream learning; INFORMATION; LANGUAGES;
D O I
10.3390/electronics10243172
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition
    Latif, Siddique
    Qadir, Junaid
    Bilal, Muhammad
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [2] Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network
    Cai, Xiong
    Wu, Zhiyong
    Zhong, Kuo
    Su, Bin
    Dai, Dongyang
    Meng, Helen
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [3] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [4] Cross-Lingual Named Entity Recognition Based on Attention and Adversarial Training
    Wang, Hao
    Zhou, Lekai
    Duan, Jianyong
    He, Li
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (04):
  • [5] Cross-Lingual Adversarial Domain Adaptation for Novice Programming
    Mao, Ye
    Khoshnevisan, Farzaneh
    Price, Thomas
    Barnes, Tiffany
    Chi, Min
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7682 - 7690
  • [6] Domain-Adversarial Training for Session Independent EMG-based Speech Recognition
    Wand, Michael
    Schultz, Tanja
    Schmidhuber, Jurgen
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3167 - 3171
  • [7] Single- and Cross-Lingual Speech Emotion Recognition Based on WavLM Domain Emotion Embedding
    Yang, Jichen
    Liu, Jiahao
    Huang, Kai
    Xia, Jiaqi
    Zhu, Zhengyu
    Zhang, Han
    [J]. ELECTRONICS, 2024, 13 (07)
  • [8] IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS
    Le Minh Nguyen
    Nayak, Shekhar
    Coler, Matt
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 792 - 797
  • [9] A Knowledge-Enhanced Adversarial Model for Cross-lingual Structured Sentiment Analysis
    Zhang, Qi
    Zhou, Jie
    Chen, Qin
    Bai, Qingchun
    Xiao, Jun
    He, Liang
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] Cross-lingual Dialog Model for Speech to Speech Translation
    Ettelaie, Emil
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1173 - 1176