END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS

被引：0

作者：

Petridis, Stavros ^{[1
]}

Li, Zuwei ^{[1
]}

Pantic, Maja ^{[1
,2
]}

机构：

[1] Imperial Coll London, Dept Comp, London, England

[2] Univ Twente, EEMCS, Enschede, Netherlands

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2017年

关键词：

Visual Speech Recognition; Lipreading; End-to-End Training; Long-Short Term Recurrent Neural Networks; Deep Networks;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Traditional visual speech recognition systems consist of two stages, feature extraction and classification. Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage. However, research on joint learning of features and classification is very limited. In this work, we present an end-to-end visual speech recognition system based on Long-Short Memory (LSTM) networks. To the best of our knowledge, this is the first model which simultaneously learns to extract features directly from the pixels and perform classification and also achieves state-of-the-art performance in visual speech classification. The model consists of two streams which extract features directly from the mouth and difference images, respectively. The temporal dynamics in each stream are modelled by an LSTM and the fusion of the two streams takes place via a Bidirectional LSTM (BLSTM). An absolute improvement of 9.7% over the base line is reported on the OuluVS2 database, and 1.5% on the CUAVE database when compared with other methods which use a similar visual front-end.

引用

下载

页码：2592 / 2596

页数：5

共 50 条

[41] Semi-Supervised End-to-End Speech Recognition
Karita, Shigeki
Watanabe, Shinji
Iwata, Tomoharu
Ogawa, Atsunori
Delcroix, Marc
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
[42] END-TO-END SPEECH RECOGNITION WITH ADAPTIVE COMPUTATION STEPS
Li, Mohan
Liu, Min
Masanori, Hattori
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6246 - 6250
[43] End-to-End Neural Segmental Models for Speech Recognition
Tang, Hao
Lu, Liang
Kong, Lingpeng
Gimpel, Kevin
Livescu, Karen
Dyer, Chris
Smith, Noah A.
Renals, Steve
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1254 - 1264
[44] IMPROVING END-TO-END SPEECH RECOGNITION WITH POLICY LEARNING
Zhou, Yingbo
Xiong, Caiming
Socher, Richard
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5819 - 5823
[45] STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES
He, Yanzhang
Sainath, Tara N.
Prabhavalkar, Rohit
McGraw, Ian
Alvarez, Raziel
Zhao, Ding
Rybach, David
Kannan, Anjuli
Wu, Yonghui
Pang, Ruoming
Liang, Qiao
Bhatia, Deepti
Yuan Shangguan
Li, Bo
Pundak, Golan
Sim, Khe Chai
Bagby, Tom
Chang, Shuo-yiin
Rao, Kanishka
Gruenstein, Alexander
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6381 - 6385
[46] SELF-TRAINING FOR END-TO-END SPEECH RECOGNITION
Kahn, Jacob
Lee, Ann
Hannun, Awni
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7084 - 7088
[47] Towards end-to-end speech recognition with transfer learning
Chu-Xiong Qin
Dan Qu
Lian-Hai Zhang
EURASIP Journal on Audio, Speech, and Music Processing, 2018
[48] End-to-end named entity recognition for Vietnamese speech
Nguyen, Thu-Hien
Nguyen, Thai-Binh
Do, Quoc-Truong
Nguyen, Tuan-Linh
2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
[49] Two-Pass End-to-End Speech Recognition
Sainath, Tara N.
Pang, Ruoming
Rybach, David
He, Yanzhang
Prabhavalkar, Rohit
Li, Wei
Visontai, Mirko
Liang, Qiao
Strohman, Trevor
Wu, Yonghui
McGraw, Ian
Chiu, Chung-Cheng
INTERSPEECH 2019, 2019, : 2773 - 2777
[50] Online Compressive Transformer for End-to-End Speech Recognition
Leong, Chi-Hang
Huang, Yu-Han
Chien, Jen-Tzung
INTERSPEECH 2021, 2021, : 2082 - 2086

← 1 2 3 4 5 →