NEURAL LATTICE SEARCH FOR SPEECH RECOGNITION

被引：0

作者：

Ma, Rao ^{[1
]}

Li, Hao ^{[1
]}

Liu, Qi ^{[1
]}

Chen, Lu ^{[1
]}

Yu, Kai ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, Dept Comp Sci & Engn, SpeechLab, Shanghai, Peoples R China

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

speech recognition; word lattice; lattice-to-sequence; attention models; forward-backward algorithm;

D O I：

10.1109/icassp40776.2020.9054109

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

To improve the accuracy of automatic speech recognition, a two-pass decoding strategy is widely adopted. The first-pass model generates compact word lattices, which are utilized by the second-pass model to perform rescoring. Currently, the most popular rescoring methods are N-best rescoring and lattice rescoring with long short-term memory language models (LSTMLMs). However, these methods encounter the problem of limited search space or inconsistency between training and evaluation. In this paper, we address these problems with an end-to-end model for accurately extracting the best hypothesis from the word lattice. Our model is composed of a bidirectional LatticeLSTM encoder followed by an attentional LSTM decoder. The model takes word lattice as input and generates the single best hypothesis from the given lattice space. When combined with an LSTMLM, the proposed model yields 9.7% and 7.5% relative WER reduction compared to N -best rescoring methods and lattice rescoring methods within the same amount of decoding time.

引用

页码：7794 / 7798

页数：5

共 50 条

[1] NEURAL ARCHITECTURE SEARCH FOR SPEECH EMOTION RECOGNITION
Wu, Xixin
Hu, Shoukang
Wu, Zhiyong
Liu, Xunying
Meng, Helen
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6902 - 6906
[2] LATENCY-CONTROLLED NEURAL ARCHITECTURE SEARCH FOR STREAMING SPEECH RECOGNITION
He, Liqiang
Feng, Shulin
Su, Dan
Yu, Dong
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 62 - 67
[3] EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition
Sun, Haiyang
Lian, Zheng
Liu, Bin
Li, Ying
Sun, Licai
Cai, Cong
Tao, Jianhua
Wang, Meng
Cheng, Yuan
INTERSPEECH 2023, 2023, : 3597 - 3601
[4] Integration of speech recognition and machine translation: Speech recognition word lattice translation
Zhang, RQ
Kikui, G
SPEECH COMMUNICATION, 2006, 48 (3-4) : 321 - 334
[5] NEURAL ARRAYS FOR SPEECH RECOGNITION
TATTERSALL, GD
LINFORD, PW
LINGGARD, R
BRITISH TELECOM TECHNOLOGY JOURNAL, 1988, 6 (02): : 140 - 163
[6] Evolved Speech-Transformer: Applying Neural Architecture Search to End-to-End Automatic Speech Recognition
Kim, Jihwan
Wang, Jisung
Kim, Sangki
Lee, Yeha
INTERSPEECH 2020, 2020, : 1788 - 1792
[7] MULTILINGUAL SPEECH EMOTION RECOGNITION WITH MULTI-GATING MECHANISM AND NEURAL ARCHITECTURE SEARCH
Wang, Zihan
Meng, Qi
Lan, HaiFeng
Zhang, XinRui
Guo, KeHao
Gupta, Akshat
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 806 - 813
[8] Automatic Speech Recognition by Cuckoo Search Optimization based Artificial Neural Network Classifier
Mendiratta, Sunanda
Turk, Neelam
Bansal, Dipali
2015 INTERNATIONAL CONFERENCE ON SOFT COMPUTING TECHNIQUES AND IMPLEMENTATIONS (ICSCTI), 2015,
[9] Segmental search for continuous speech recognition
Laface, P
Fissore, L
Maro, A
Ravera, F
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2155 - 2158
[10] Cuckoo Search Algorithm for Speech Recognition
Ghose, Rahul
Das, Tejes
Chattopadhyay, Soummyo Priyo
Das, Tiyasha
Saha, Ayoshna
2015 INTERNATIONAL CONFERENCE AND WORKSHOP ON COMPUTING AND COMMUNICATION (IEMCON), 2015,

← 1 2 3 4 5 →