Speaker-Independent Speech Recognition using Visual Features

被引:0
|
作者
Pooventhiran, G. [1 ]
Sandeep, A. [1 ]
Manthiravalli, K. [1 ]
Harish, D. [1 ]
Renuka, Karthika D. [1 ]
机构
[1] PSG Coll Technol, Dept Informat Technol, Coimbatore 641004, Tamil Nadu, India
关键词
Visual speech recognition; audio speech recognition; visemes; lip reading system; Convolutional Neural Network (CNN);
D O I
10.14569/IJACSA.2020.0111175
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Visual Speech Recognition aims at transcribing lip movements into readable text. There have been many strides in automatic speech recognition systems that can recognize words with audio and visual speech features, even under noisy conditions. This paper focuses only on the visual features, while a robust system uses visual features to support acoustic features. We propose the concatenation of visemes (lip movements) for text classification rather than a classic individual viseme mapping. The result shows that this approach achieves a significant improvement over the state-of-the-art models. The system has two modules; the first one extracts lip features from the input video, while the next is a neural network system trained to process the viseme sequence and classify it as text.
引用
收藏
页码:616 / 620
页数:5
相关论文
共 50 条
  • [41] SPEAKER-INDEPENDENT SPEECH-RECOGNITION SYSTEM BASED ON LINEAR PREDICTION
    GUPTA, VN
    BRYAN, JK
    GOWDY, JN
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1978, 26 (01): : 27 - 33
  • [42] A HMM-based integrated method for speaker-independent speech recognition
    Zhang, YY
    Zhu, XY
    ICSP '98: 1998 FOURTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1998, : 613 - 616
  • [43] Domain Invariant Feature Learning for Speaker-Independent Speech Emotion Recognition
    Lu, Cheng
    Zong, Yuan
    Zheng, Wenming
    Li, Yang
    Tang, Chuangao
    Schuller, Bjoern W.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 2217 - 2230
  • [44] Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations
    Mueller, Florian
    Mertins, Alfred
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2010, 5933 : 111 - 119
  • [45] Speaker-Independent Silent Speech Recognition with Across-Speaker Articulatory Normalization and Speaker Adaptive Training
    Wang, Jun
    Hahm, Seongjun
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2415 - 2419
  • [46] Speaker-independent recognition of Chinese tones
    GUAN Cuntai and CHEN Yongbin(Dep. of Radio Eng.
    Chinese Journal of Acoustics, 1993, (02) : 142 - 148
  • [47] SPEAKER-INDEPENDENT DIGIT RECOGNITION SYSTEM
    SAMBUR, MR
    RABINER, LR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 56 : S26 - S26
  • [48] SPEAKER-INDEPENDENT MANDARINE PLOSIVE RECOGNITION WITH DYNAMIC FEATURES AND MULTILAYER PERCEPTRONS
    CHEN, WY
    CHEN, SH
    ELECTRONICS LETTERS, 1995, 31 (04) : 258 - 259
  • [49] DYNAMIC SPEAKER ADAPTATION IN SPEAKER-INDEPENDENT WORD RECOGNITION
    HEWETT, AJ
    HOLMES, G
    YOUNG, SJ
    PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 275 - 282
  • [50] Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models
    Abdelaziz, Ahmed Hussen
    Theobald, Barry-John
    Binder, Justin
    Fanelli, Gabriele
    Dixon, Paul
    Apostoloff, Nicholas
    Weise, Thibaut
    Kajareker, Sachin
    ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 220 - 225