Enhancing multilingual speech recognition in air traffic control by sentence-level language identification

被引：0

作者：

Fan, Peng ^{[1
]}

Guo, Dongyue ^{[1
]}

Zhang, Jianwei ^{[1
,2
]}

Yang, Bo ^{[1
,2
]}

Lin, Yi ^{[1
,2
]}

机构：

[1] Sichuan Univ, Natl Key Lab Fundamental Sci Synthet Vis, Chengdu 610065, Peoples R China

[2] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

来源：

APPLIED ACOUSTICS | 2024年 / 224卷

关键词：

Air traffic control; Multilingual; Speech recognition; FiLM conditioning; End-to-end speech recognition; NEURAL-NETWORKS;

D O I：

10.1016/j.apacoust.2024.110123

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Automatic speech recognition (ASR) technique is becoming increasingly popular to improve the efficiency and safety of air traffic control (ATC) operations. However, the conversation between ATC controllers and pilots using multilingual speech brings a great challenge to building high-accuracy ASR systems. In this work, we present a two-stage multilingual ASR framework. The first stage is to train a language identifier model (LIM), that is based on a recurrent neural network (RNN) to obtain sentence-level language identification (SLID) in the form of one-hot encoding. The second stage aims to train an RNN-based end-to-end multilingual recognition model that utilizes SLID generated by LIM to enhance input features. In this work, we introduce Feature-wise Linear Modulation (FiLM) to improve the performance of multilingual ASR by utilizing SLID. Furthermore, we introduce a new learning module called SLIL, which consists of a FiLM layer and a Squeeze-and-Excitation Networks layer. Extensive experiments on the ATCSpeech dataset show that our proposed method outperforms the baseline model. Compared to the vanilla FiLMed backbone model, the proposed multilingual ASR model obtains about 7.50% character error rate relative performance improvement.

引用

页数：10

共 50 条

[1] Enhancing multilingual recognition of emotion in speech by language identification
Sagha, Hesam
Matejka, Pavel
Gavryukova, Maryna
Povolny, Filip
Marchi, Erik
Schuller, Bjoern
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2949 - 2953
[2] Phoneme and Sentence-Level Ensembles for Speech Recognition
Christos Dimitrakakis
Samy Bengio
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2011
[3] Phoneme and Sentence-Level Ensembles for Speech Recognition
Dimitrakakis, Christos
Bengio, Samy
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011,
[4] A Unified Framework for Multilingual Speech Recognition in Air Traffic Control Systems
Lin, Yi
Guo, Dongyue
Zhang, Jianwei
Chen, Zhengmao
Yang, Bo
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (08) : 3608 - 3620
[5] Sentence-Level Sign Language Recognition Using RF signals
Meng, Xianjia
Feng, Lin
Yin, Xiao
Zhou, Huanting
Sheng, Chang
Wang, Chongyang
Du, Anxun
Xu, Linzhi
[J]. 2019 6TH INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC AND SOCIO-CULTURAL COMPUTING (BESC 2019), 2019,
[6] Does Sentence-Level Coarticulation Affect Speech Recognition in Noise or a Speech Masker?
Jett, Brandi
Buss, Emily
Best, Virginia
Oleson, Jacob
Calandruccio, Lauren
[J]. JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2021, 64 (04): : 1390 - 1403
[7] Towards multilingual end-to-end speech recognition for air traffic control
Lin, Yi
Yang, Bo
Guo, Dongyue
Fan, Peng
[J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (09) : 1203 - 1214
[8] Analyzing Continuous-Time and Sentence-Level Annotations for Speech Emotion Recognition
Martinez-Lucas, Luz
Lin, Wei-Cheng
Busso, Carlos
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1754 - 1768
[9] Sentence-Level Automatic Speech Segmentation for Amharic
Tamiru, Rahel Mekonen
Abate, Solomon Teferra
[J]. PROCEEDINGS OF SIXTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICICT 2021), VOL 2, 2022, 236 : 477 - 485
[10] A unified system for multilingual speech recognition and language identification
Liu, Danyang
Xu, Ji
Zhang, Pengyuan
Yan, Yonghong
[J]. SPEECH COMMUNICATION, 2021, 127 : 17 - 28

← 1 2 3 4 5 →