Domain-Adversarial Based Model with Phonological Knowledge for Cross-Lingual Speech Recognition

被引:1
|
作者
Zhan, Qingran [1 ]
Xie, Xiang [1 ,2 ]
Hu, Chenguang [1 ]
Zuluaga-Gomez, Juan [3 ,4 ,5 ]
Wang, Jing [1 ]
Cheng, Haobo [1 ,2 ]
机构
[1] Beijing Inst Technol, Sch Informat & Elect, Beijing 100081, Peoples R China
[2] Beijing Inst Technol, Shenzhen Res Inst, Shenzhen 518063, Peoples R China
[3] Idiap Res Inst, CH-1920 Martigny, Switzerland
[4] Ecole Polytech Fed Lausanne EPFL, CH-1015 Lausanne, Switzerland
[5] Univ Autonoma Caribe, Dept Mechatron Engn, Barranquilla 080020, Colombia
关键词
cross-lingual automatic speech recognition (ASR); articulatory features; domain-adversarial neural network; multi-stream learning; INFORMATION; LANGUAGES;
D O I
10.3390/electronics10243172
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Phonological-based features (articulatory features, AFs) describe the movements of the vocal organ which are shared across languages. This paper investigates a domain-adversarial neural network (DANN) to extract reliable AFs, and different multi-stream techniques are used for cross-lingual speech recognition. First, a novel universal phonological attributes definition is proposed for Mandarin, English, German and French. Then a DANN-based AFs detector is trained using source languages (English, German and French). When doing the cross-lingual speech recognition, the AFs detectors are used to transfer the phonological knowledge from source languages (English, German and French) to the target language (Mandarin). Two multi-stream approaches are introduced to fuse the acoustic features and cross-lingual AFs. In addition, the monolingual AFs system (i.e., the AFs are directly extracted from the target language) is also investigated. Experiments show that the performance of the AFs detector can be improved by using convolutional neural networks (CNN) with a domain-adversarial learning method. The multi-head attention (MHA) based multi-stream can reach the best performance compared to the baseline, cross-lingual adaptation approach, and other approaches. More specifically, the MHA-mode with cross-lingual AFs yields significant improvements over monolingual AFs with the restriction of training data size and, which can be easily extended to other low-resource languages.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] CROSS-LINGUAL CONTEXT SHARING AND PARAMETER-TYING FOR MULTI-LINGUAL SPEECH RECOGNITION
    Mohan, Aanchan
    Rose, Richard
    [J]. 2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 126 - 131
  • [42] Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
    Hu, Ke
    Bruguier, Antoine
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Pundak, Golan
    [J]. INTERSPEECH 2019, 2019, : 2155 - 2159
  • [43] Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition
    Liang, Shining
    Gong, Ming
    Pei, Jian
    Shou, Linjun
    Zuo, Wanli
    Zuo, Xianglin
    Jiang, Daxin
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3231 - 3239
  • [44] TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition
    Xue, Hongfei
    Shao, Qijie
    Chen, Peikun
    Guo, Pengcheng
    Xie, Lei
    Liu, Jie
    [J]. INTERSPEECH 2023, 2023, : 216 - 220
  • [45] Stream-based Context-sensitive Phone Mapping for Cross-lingual Speech Recognition
    Sim, Khe Chai
    Li, Haizhou
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2963 - 2966
  • [46] Conversations Powered by Cross-Lingual Knowledge
    Sun, Weiwei
    Meng, Chuan
    Meng, Qi
    Ren, Zhaochun
    Ren, Pengjie
    Chen, Zhumin
    de Rijke, Maarten
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1442 - 1451
  • [47] Cross-lingual Named Entity Recognition
    Steinberger, Ralf
    Pouliquen, Bruno
    [J]. LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 135 - 162
  • [48] Transition-based Adversarial Network for Cross-lingual Aspect Extraction
    Wang, Wenya
    Pan, Sinno Jialin
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4475 - 4481
  • [49] Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model
    Li, Juntao
    He, Ruidan
    Ye, Hai
    Ng, Hwee Tou
    Bing, Lidong
    Yan, Rui
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3672 - 3678
  • [50] A many-to-one phone mapping approach for cross-lingual speech recognition
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    [J]. 2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 120 - 124