EXPLORING A ZERO-ORDER DIRECT HMM BASED ON LATENT ATTENTION FOR AUTOMATIC SPEECH RECOGNITION

被引：0

作者：

Bahar, Parnia ^{[1
,2
]}

Makarovi, Nikita ^{[1
]}

Zeyer, Albert ^{[1
,2
]}

Schlueter, Ralf ^{[1
,2
]}

Ney, Hermann ^{[1
,2
]}

机构：

[1] Rhein Westfal TH Aachen, Human Language Technol & Pattern Recognit Grp, Comp Sci Dept, D-52074 Aachen, Germany

[2] AppTek GmbH, D-52062 Aachen, Germany

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

欧洲研究理事会;

关键词：

End-to-end speech recognition; Latent models; direct HMM; Attention; Transformer; LSTM; MODELS;

D O I：

10.1109/icassp40776.2020.9054545

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we study a simple yet elegant latent variable attention model for automatic speech recognition (ASR) which enables an integration of attention sequence modeling into the direct hidden Markov model (HMM) concept. We use a sequence of hidden variables that establishes a mapping from output labels to input frames. Inspired by the direct HMM model, we assume a decomposition of the label sequence posterior into emission and transition probabilities using zero-order assumption and incorporate both Transformer and LSTM attention models into it. The method keeps the explicit alignment as part of the stochastic model and combines the ease of the end-to-end training of the attention model as well as an efficient and simple beam search. To study the effect of the latent model, we qualitatively analyze the alignment behavior of the different approaches. Our experiments on three ASR tasks show promising results in WER with more focused alignments in comparison to the attention models.

引用

页码：7854 / 7858

页数：5

共 50 条

[41] DIRECT PRODUCT BASED DEEP BELIEF NETWORKS FOR AUTOMATIC SPEECH RECOGNITION
Fousek, Petr
Rennie, Steven
Dognin, Pierre
Goel, Vaibhava
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3148 - 3152
[42] Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition
Zhao, Chendong
Wang, Jianzong
Wei, Wenqi
Qu, Xiaoyang
Wang, Haoqian
Xiao, Jing
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 173 - 180
[43] Speech recognition based on genetic algorithm for training HMM
Sun, F
Hu, GR
ELECTRONICS LETTERS, 1998, 34 (16) : 1563 - 1564
[44] English Speech Recognition System Based on HMM in Matlab
Yang Xiaocui
Sun Lihua
PROCEEDINGS OF 2009 CONFERENCE ON COMMUNICATION FACULTY, 2009, : 337 - 341
[45] Peripheral features for HMM-based speech recognition
Fukuda, T
Takigawa, M
Nitta, T
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132
[46] HMM BASED RECOGNITION OF CHINESE TONES IN CONTINUOUS SPEECH
Zhao Li (Department of Radio Engineering
Journal of Electronics(China), 2000, (01) : 9 - 14
[47] Knowledge-based parameters for HMM speech recognition
Bitar, NN
EspyWilson, CY
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 29 - 32
[48] Using teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech
Heracleous, Panikos
World Academy of Science, Engineering and Technology, 2009, 35 : 633 - 639
[49] A Research on HMM based Speech Recognition in Spoken English
Wang, Na
Zhang, Xiaohong
Sharma, Ashutosh
RECENT ADVANCES IN ELECTRICAL & ELECTRONIC ENGINEERING, 2021, 14 (06) : 617 - 626
[50] Speech emotion recognition based on a hybrid of HMM/ANN
Mao, Xia
Zhang, Bing
Luo, Yi
PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATIONS, 2007, : 369 - 372

← 1 2 3 4 5 →