Enhancing multilingual speech recognition in air traffic control by sentence-level language identification

被引:0
|
作者
Fan, Peng [1 ]
Guo, Dongyue [1 ]
Zhang, Jianwei [1 ,2 ]
Yang, Bo [1 ,2 ]
Lin, Yi [1 ,2 ]
机构
[1] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
关键词
Air traffic control; Multilingual; Speech recognition; FiLM conditioning; End-to-end speech recognition; NEURAL-NETWORKS;
D O I
10.1016/j.apacoust.2024.110123
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition (ASR) technique is becoming increasingly popular to improve the efficiency and safety of air traffic control (ATC) operations. However, the conversation between ATC controllers and pilots using multilingual speech brings a great challenge to building high-accuracy ASR systems. In this work, we present a two-stage multilingual ASR framework. The first stage is to train a language identifier model (LIM), that is based on a recurrent neural network (RNN) to obtain sentence-level language identification (SLID) in the form of one-hot encoding. The second stage aims to train an RNN-based end-to-end multilingual recognition model that utilizes SLID generated by LIM to enhance input features. In this work, we introduce Feature-wise Linear Modulation (FiLM) to improve the performance of multilingual ASR by utilizing SLID. Furthermore, we introduce a new learning module called SLIL, which consists of a FiLM layer and a Squeeze-and-Excitation Networks layer. Extensive experiments on the ATCSpeech dataset show that our proposed method outperforms the baseline model. Compared to the vanilla FiLMed backbone model, the proposed multilingual ASR model obtains about 7.50% character error rate relative performance improvement.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Exploring teacher and student knowledge of sentence-level language features
    Knecht, Rachel
    Larson, Lisa
    Townsend, Dianna
    [J]. JOURNAL OF ADOLESCENT & ADULT LITERACY, 2023, 66 (06) : 344 - 354
  • [32] Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
    Bae, Jae-Sung
    Bae, Hanbin
    Joo, Young-Sun
    Lee, Junmo
    Lee, Gyeong-Hoon
    Cho, Hoon-Young
    [J]. INTERSPEECH 2020, 2020, : 4402 - 4406
  • [33] Automatic Speech Recognition for Air Traffic Control Communications
    Badrinath, Sandeep
    Balakrishnan, Hamsa
    [J]. TRANSPORTATION RESEARCH RECORD, 2022, 2676 (01) : 798 - 810
  • [34] Using Sentence-Level LSTM Language Models for Script Inference
    Pichotta, Karl
    Mooney, Raymond J.
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 279 - 289
  • [35] Enhancing Targeted Minority Class Prediction in Sentence-Level Relation Extraction
    Baek, Hyeong-Ryeol
    Choi, Yong-Suk
    [J]. SENSORS, 2022, 22 (13)
  • [36] EFFECTS OF WORD-LEVEL AND SENTENCE-LEVEL CONTEXTS UPON WORD RECOGNITION
    COLOMBO, L
    WILLIAMS, J
    [J]. MEMORY & COGNITION, 1990, 18 (02) : 153 - 163
  • [37] Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
    Zhang, C.
    Li, B.
    Sainath, T. N.
    Strohman, T.
    Mavandadi, S.
    Chang, S.
    Haghani, P.
    [J]. INTERSPEECH 2022, 2022, : 3223 - 3227
  • [38] Hybrid Approach for Language Identification Oriented to Multilingual Speech Recognition in the Basque Context
    Barroso, N.
    Lopez de Ipina, K.
    Ezeiza, A.
    Barroso, O.
    Susperregi, U.
    [J]. HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, PT 1, 2010, 6076 : 196 - +
  • [39] Sentence-level meaning aids perception of speech in noise: An FMRI study
    Davis, MH
    Ford, MA
    Johnsrude, IS
    [J]. JOURNAL OF COGNITIVE NEUROSCIENCE, 2005, : 125 - 125
  • [40] Language Adaptive Multilingual CTC Speech Recognition
    Mueller, Markus
    Stueker, Sebastian
    Waibel, Alex
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 473 - 482