Enhancing multilingual speech recognition in air traffic control by sentence-level language identification

被引:0
|
作者
Fan, Peng [1 ]
Guo, Dongyue [1 ]
Zhang, Jianwei [1 ,2 ]
Yang, Bo [1 ,2 ]
Lin, Yi [1 ,2 ]
机构
[1] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
关键词
Air traffic control; Multilingual; Speech recognition; FiLM conditioning; End-to-end speech recognition; NEURAL-NETWORKS;
D O I
10.1016/j.apacoust.2024.110123
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) technique is becoming increasingly popular to improve the efficiency and safety of air traffic control (ATC) operations. However, the conversation between ATC controllers and pilots using multilingual speech brings a great challenge to building high-accuracy ASR systems. In this work, we present a two-stage multilingual ASR framework. The first stage is to train a language identifier model (LIM), that is based on a recurrent neural network (RNN) to obtain sentence-level language identification (SLID) in the form of one-hot encoding. The second stage aims to train an RNN-based end-to-end multilingual recognition model that utilizes SLID generated by LIM to enhance input features. In this work, we introduce Feature-wise Linear Modulation (FiLM) to improve the performance of multilingual ASR by utilizing SLID. Furthermore, we introduce a new learning module called SLIL, which consists of a FiLM layer and a Squeeze-and-Excitation Networks layer. Extensive experiments on the ATCSpeech dataset show that our proposed method outperforms the baseline model. Compared to the vanilla FiLMed backbone model, the proposed multilingual ASR model obtains about 7.50% character error rate relative performance improvement.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Enhancing multilingual recognition of emotion in speech by language identification
    Sagha, Hesam
    Matejka, Pavel
    Gavryukova, Maryna
    Povolny, Filip
    Marchi, Erik
    Schuller, Bjoern
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2949 - 2953
  • [2] Phoneme and Sentence-Level Ensembles for Speech Recognition
    Christos Dimitrakakis
    Samy Bengio
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2011
  • [3] Phoneme and Sentence-Level Ensembles for Speech Recognition
    Dimitrakakis, Christos
    Bengio, Samy
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011,
  • [4] A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
    Lin, Yi
    Guo, Dongyue
    Zhang, Jianwei
    Chen, Zhengmao
    Yang, Bo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3608 - 3620
  • [5] Sentence-Level Sign Language Recognition Using RF signals
    Meng, Xianjia
    Feng, Lin
    Yin, Xiao
    Zhou, Huanting
    Sheng, Chang
    Wang, Chongyang
    Du, Anxun
    Xu, Linzhi
    [J]. 2019 6TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC AND SOCIO-CULTURAL COMPUTING (BESC 2019), 2019,
  • [6] Does Sentence-Level Coarticulation Affect Speech Recognition in Noise or a Speech Masker?
    Jett, Brandi
    Buss, Emily
    Best, Virginia
    Oleson, Jacob
    Calandruccio, Lauren
    [J]. JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2021, 64 (04): : 1390 - 1403
  • [7] Towards multilingual end-to-end speech recognition for air traffic control
    Lin, Yi
    Yang, Bo
    Guo, Dongyue
    Fan, Peng
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (09) : 1203 - 1214
  • [8] Analyzing Continuous-Time and Sentence-Level Annotations for Speech Emotion Recognition
    Martinez-Lucas, Luz
    Lin, Wei-Cheng
    Busso, Carlos
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1754 - 1768
  • [9] Sentence-Level Automatic Speech Segmentation for Amharic
    Tamiru, Rahel Mekonen
    Abate, Solomon Teferra
    [J]. PROCEEDINGS OF SIXTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICICT 2021), VOL 2, 2022, 236 : 477 - 485
  • [10] A unified system for multilingual speech recognition and language identification
    Liu, Danyang
    Xu, Ji
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. SPEECH COMMUNICATION, 2021, 127 : 17 - 28