Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features

被引:0
|
作者
Zhan, Qingran [1 ,2 ]
Motlicek, Petr [2 ]
Du, Shixuan [1 ]
Shan, Yahui [1 ]
Ma, Sifan [1 ]
Xie, Xiang [1 ,3 ]
机构
[1] Beijing Inst Technol, Informat & Elect Inst, Beijing, Peoples R China
[2] Idiap Res Inst, Martigny, Switzerland
[3] Beijing Inst Technol, Shenzhen Res Inst, Shenzhen, Switzerland
关键词
PHONE RECOGNITION; NEURAL-NETWORK; LANGUAGES;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Articulatory features (AFs) provide language-independent attribute by exploiting the speech production knowledge. This paper proposes a cross-lingual automatic speech recognition (ASR) based on AF methods. Various neural network (NN) architectures are explored to extract cross-lingual AFs and their performance is studied. The architectures include muti-layer perception(MLP), convolutional NN (CNN) and long short-term memory recurrent NN (LSTM). In our cross-lingual setup, only the source language (English, representing a well-resourced language) is used to train the AF extractors. AFs are then generated for the target language (Mandarin, representing an under-resourced language) using the trained extractors. The frame-classification accuracy indicates that the LSTM has an ability to perform a knowledge transfer through the robust cross-lingual AFs from well-resourced to under-resourced language. The final ASR system is built using traditional approaches (e.g. hybrid models), combining AFs with conventional MFCCs. The results demonstrate that the cross-lingual AFs improve the performance in under-resourced ASR task even though the source and target languages come from different language family. Overall, the proposed cross-lingual ASR approach provides slight improvement over the monolingual LF-MMI and cross-lingual (acoustic model adaptation-based) ASR systems.
引用
收藏
页码:1912 / 1916
页数:5
相关论文
共 50 条
  • [1] Cross-Lingual Automatic Speech Recognition Using Tandem Features
    Lal, Partha
    King, Simon
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (12): : 2506 - 2515
  • [2] Exploiting Adapters for Cross-Lingual Low-Resource Speech Recognition
    Hou, Wenxin
    Zhu, Han
    Wang, Yidong
    Wang, Jindong
    Qin, Tao
    Xu, Renju
    Shinozaki, Takahiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 317 - 329
  • [3] Polyglot Speech Synthesis Based on Cross-Lingual Frame Selection Using Auditory and Articulatory Features
    Chen, Chia-Ping
    Huang, Yi-Chin
    Wu, Chung-Hsien
    Lee, Kuan-De
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1558 - 1570
  • [4] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [5] IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS
    Le Minh Nguyen
    Nayak, Shekhar
    Coler, Matt
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 792 - 797
  • [6] Cross-lingual articulatory feature information transfer for speech recognition using recurrent progressive neural networks
    Morshed, Mahir
    Hasegawa-Johnson, Mark
    INTERSPEECH 2022, 2022, : 2298 - 2302
  • [7] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    INTERSPEECH 2022, 2022, : 2178 - 2182
  • [8] CLIoS: Cross-lingual Induction of Speech Recognition Grammars
    Perera, Nadine
    Pitz, Michael
    Pinkal, Manfred
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2487 - 2494
  • [9] Unsupervised Cross-lingual Representation Learning for Speech Recognition
    Conneau, Alexis
    Baevski, Alexei
    Collobert, Ronan
    Mohamed, Abdelrahman
    Auli, Michael
    INTERSPEECH 2021, 2021, : 2426 - 2430
  • [10] EXPLOITING CROSS DOMAIN ACOUSTIC-TO-ARTICULATORY INVERTED FEATURES FOR DISORDERED SPEECH RECOGNITION
    Hu, Shujie
    Liu, Shansong
    Xie, Xurong
    Geng, Mengzhe
    Wang, Tianzi
    Hu, Shoukang
    Cui, Mingyu
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6747 - 6751