A new manifold representation for visual speech recognition

被引:0
|
作者
Yu, Dahai [1 ]
Ghita, Ovidiu [1 ]
Sutherland, Alistair [1 ]
Whelan, Paul F. [1 ]
机构
[1] Dublin City Univ, Vis Syst Grp, Sch Elect & Comp Engn, Dublin 9, Ireland
关键词
visual speech recognition; PCA manifolds; spline interpolation; k-nearest neighbour; hidden Markov model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a new manifold representation capable of being applied for visual speech recognition. In this regard, the real time input video data is compressed using Principal Component Analysis (PCA) and the low-dimensional points calculated for each frame define the manifolds. Since the number of frames that from the video sequence is dependent on the word complexity, in order to use these manifolds for visual speech classification it is required to re-sample them into a fixed number of keypoints that are used as input for classification. In this paper two classification schemes, namely the k Nearest Neighbour (kNN) algorithm that is used in conjunction with the two-stage PCA and Hidden-Markov-Model (HMM) classifier are evaluated. The classification results for a group of English words indicate that the proposed approach is able to produce accurate classification results.
引用
收藏
页码:374 / 382
页数:9
相关论文
共 50 条
  • [1] A new manifold representation for visual speech recognition
    Yu, Dahai
    Ghita, Ovidiu
    Sutherland, Alistair
    Whelan, Paul F.
    IMVIP 2007: INTERNATIONAL MACHINE VISION AND IMAGE PROCESSING CONFERENCE, PROCEEDINGS, 2007, : 210 - 210
  • [2] A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition
    Yu, Dahai
    Ghita, Ovidiu
    Sutherland, Alistair
    Whelan, Paul F.
    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PROCEEDINGS, 2009, 5414 : 398 - 409
  • [3] Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold
    Sahraeian, Reza
    Van Compernolle, Dirk
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (12) : 2301 - 2312
  • [4] Under-resouteed Speech Recognition based on the Speech Manifold
    Sahraeian, Reza
    van Compernalle, Drik
    de Wet, Febe
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1255 - 1259
  • [5] Multistream sparse representation features for noise robust audio-visual speech recognition
    Shen, Peng
    Tamura, Satoshi
    Hayamizu, Satoru
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2014, 35 (01) : 17 - 27
  • [6] A NEW PHASE-BASED FEATURE REPRESENTATION FOR ROBUST SPEECH RECOGNITION
    Loweimi, Erfan
    Ahad, Seyed Mohammad
    Drugman, Thomas
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7155 - 7159
  • [7] Histogram equalization of speech representation for robust speech recognition
    de la Torre, A
    Peinado, AM
    Segura, JC
    Pérez-Córdoba, JL
    Benítez, MC
    Rubio, AJ
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03): : 355 - 366
  • [8] THE INTERVALGRAM AS A VISUAL REPRESENTATION OF SPEECH SOUNDS
    CHANG, SH
    PIHL, GE
    WIREN, J
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1951, 23 (06): : 674 - 679
  • [9] Manifold HLDA and its application to robust speech recognition
    Kubo, Toshiaki
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1551 - 1554
  • [10] Feature representation for speech emotion Recognition
    Abdollahpour, Mehdi
    Zamani, Lafar
    Rad, Hamidreza Saligheh
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1465 - 1468