END-TO-END VISUAL SPEECH RECOGNITION WITH LSTMS

被引:0
|
作者
Petridis, Stavros [1 ]
Li, Zuwei [1 ]
Pantic, Maja [1 ,2 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Twente, EEMCS, Enschede, Netherlands
关键词
Visual Speech Recognition; Lipreading; End-to-End Training; Long-Short Term Recurrent Neural Networks; Deep Networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Traditional visual speech recognition systems consist of two stages, feature extraction and classification. Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage. However, research on joint learning of features and classification is very limited. In this work, we present an end-to-end visual speech recognition system based on Long-Short Memory (LSTM) networks. To the best of our knowledge, this is the first model which simultaneously learns to extract features directly from the pixels and perform classification and also achieves state-of-the-art performance in visual speech classification. The model consists of two streams which extract features directly from the mouth and difference images, respectively. The temporal dynamics in each stream are modelled by an LSTM and the fusion of the two streams takes place via a Bidirectional LSTM (BLSTM). An absolute improvement of 9.7% over the base line is reported on the OuluVS2 database, and 1.5% on the CUAVE database when compared with other methods which use a similar visual front-end.
引用
收藏
页码:2592 / 2596
页数:5
相关论文
共 50 条
  • [41] DEEP CONTEXT: END-TO-END CONTEXTUAL SPEECH RECOGNITION
    Pundak, Golan
    Sainath, Tara N.
    Prabhavalkar, Rohit
    Kannan, Anjuli
    Zhao, Ding
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 418 - 425
  • [42] End-to-end Speech-to-Punctuated-Text Recognition
    Nozaki, Jumon
    Kawahara, Tatsuya
    Ishizuka, Kenkichi
    Hashimoto, Taiichi
    [J]. INTERSPEECH 2022, 2022, : 1811 - 1815
  • [43] Towards end-to-end speech recognition with transfer learning
    Chu-Xiong Qin
    Dan Qu
    Lian-Hai Zhang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [44] End-to-end named entity recognition for Vietnamese speech
    Nguyen, Thu-Hien
    Nguyen, Thai-Binh
    Do, Quoc-Truong
    Nguyen, Tuan-Linh
    [J]. 2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,
  • [45] Phonetically Induced Subwords for End-to-End Speech Recognition
    Papadourakis, Vasileios
    Mueller, Markus
    Liu, Jing
    Mouchtaris, Athanasios
    Omologo, Maurizio
    [J]. INTERSPEECH 2021, 2021, : 1992 - 1996
  • [46] Adapting End-to-End Speech Recognition for Readable Subtitles
    Liu, Danni
    Niehues, Jan
    Spanakis, Gerasimos
    [J]. 17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 247 - 256
  • [47] Hybrid end-to-end model for Kazakh speech recognition
    Mamyrbayev O.Z.
    Oralbekova D.O.
    Alimhan K.
    Nuranbayeva B.M.
    [J]. International Journal of Speech Technology, 2023, 26 (02) : 261 - 270
  • [48] Insights on Neural Representations for End-to-End Speech Recognition
    Ollerenshaw, Anna
    Jalal, Asif
    Hain, Thomas
    [J]. INTERSPEECH 2021, 2021, : 4079 - 4083
  • [49] End-to-end speech-to-dialog-act recognition
    Dang, Viet-Trung
    Zhao, Tianyu
    Ueno, Sei
    Inaguma, Hirofumi
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2020, 2020, : 3910 - 3914
  • [50] Inverted Alignments for End-to-End Automatic Speech Recognition
    Doetsch, Patrick
    Hannemann, Mirko
    Schluter, Ralf
    Ney, Hermann
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2017, 11 (08) : 1265 - 1273