LANGUAGE INDEPENDENT END-TO-END ARCHITECTURE FOR JOINT LANGUAGE IDENTIFICATION AND SPEECH RECOGNITION

被引：0

作者：

Watanabe, Shinji ^{[1
]}

Hori, Takaaki ^{[1
]}

Hershey, John R. ^{[1
]}

机构：

[1] Mitsubishi Elect Res Labs, Cambridge, MA 02139 USA

来源：

2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2017年

关键词：

End-to-end ASR; multilingual ASR; language-independent architecture; language identification; hybrid attention/CTC;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. This also creates an opportunity, which we fully exploit in this paper, to build a monolithic multilingual ASR system with a language-independent neural network architecture. We present a model that can recognize speech in 10 different languages, by directly performing grapheme (character/chunked-character) based speech recognition. The model is based on our hybrid attention/connectionist temporal classification (CTC) architecture which has previously been shown to achieve the state-of-the-art performance in several ASR benchmarks. Here we augment its set of output symbols to include the union of character sets appearing in all the target languages. These include Roman and Cyrillic Alphabets, Arabic numbers, simplified Chinese, and Japanese Kanji/Hiragana/Katakana characters (5,500 characters in all). This allows training of a single multilingual model, whose parameters are shared across all the languages. The model can jointly identify the language and recognize the speech, automatically formatting the recognized text in the appropriate character set. The experiments, which used speech databases composed of Wall Street Journal (English), Corpus of Spontaneous Japanese, HKUST Mandarin CTS, and Voxforge (German, Spanish, French, Italian, Dutch, Portuguese, Russian), demonstrate comparable/superior performance relative to language-dependent end-to-end ASR systems.

引用

页码：265 / 271

页数：7

共 50 条

[1] Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Zhang, C.
Li, B.
Sainath, T. N.
Strohman, T.
Mavandadi, S.
Chang, S.
Haghani, P.
[J]. INTERSPEECH 2022, 2022, : 3223 - 3227
[2] End-to-End Speech Recognition of Tamil Language
Changrampadi, Mohamed Hashim
Shahina, A.
Narayanan, M. Badri
Khan, A. Nayeemulla
[J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2022, 32 (02): : 1309 - 1323
[3] INDEPENDENT LANGUAGE MODELING ARCHITECTURE FOR END-TO-END ASR
Van Tung Pham
Xu, Haihua
Khassanov, Yerbolat
Zeng, Zhiping
Chng, Eng Siong
Ni, Chongjia
Ma, Bin
Li, Haizhou
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7059 - 7063
[4] Residual Language Model for End-to-end Speech Recognition
Tsunoo, Emiru
Kashiwagi, Yosuke
Narisetty, Chaitanya
Watanabe, Shinji
[J]. INTERSPEECH 2022, 2022, : 3899 - 3903
[5] Language and Speaker-Independent Feature Transformation for End-to-End Multilingual Speech Recognition
Hayakawa, Tomoaki
Leow, Chee Siang
Kobayashi, Akio
Utsuro, Takehito
Nishizaki, Hiromitsu
[J]. INTERSPEECH 2021, 2021, : 2431 - 2435
[6] TOWARDS LANGUAGE-UNIVERSAL END-TO-END SPEECH RECOGNITION
Kim, Suyoun
Seltzer, Michael L.
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4914 - 4918
[7] End-to-End Large Vocabulary Speech Recognition for the Serbian Language
Popovic, Branislav
Pakoci, Edvin
Pekar, Darko
[J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 343 - 352
[8] LEVERAGING LANGUAGE ID IN MULTILINGUAL END-TO-END SPEECH RECOGNITION
Waters, Austin
Gaur, Neeraj
Haghani, Parisa
Moreno, Pedro
Qu, Zhongdi
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 928 - 935
[9] Noise Robust End-to-End Speech Recognition For Bangla Language
Sumit, Sakhawat Hosain
Al Muntasir, Tareq
Zaman, M. M. Arefin
Nandi, Rabindra Nath
Sourov, Tanvir
[J]. 2018 INTERNATIONAL CONFERENCE ON BANGLA SPEECH AND LANGUAGE PROCESSING (ICBSLP), 2018,
[10] Speech Corpus of Ainu Folklore and End-to-end Speech Recognition for Ainu Language
Matsuura, Kohei
Ueno, Sei
Mimura, Masato
Sakai, Shinsuke
Kawahara, Tatsuya
[J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2622 - 2628

← 1 2 3 4 5 →