Discriminative analysis of lip motion features for speaker identification and speech-reading

被引:70
|
作者
Cetinguel, H. Ertan [1 ]
Yemez, Yuecel [1 ]
Erzin, Engin [1 ]
Tekalp, A. Murat [1 ]
机构
[1] Koc Univ, Coll Engn, Multimedia Vis & Graph Lab, TR-34450 Istanbul, Turkey
关键词
Bayesian discriminative feature selection; lip motion; speaker identification; speech recognition; temporal discriminative feature selection;
D O I
10.1109/TIP.2006.877528
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
引用
收藏
页码:2879 / 2891
页数:13
相关论文
共 50 条
  • [31] Automatic extraction of geometric lip features with application to multi-modal speaker identification
    Arsic, Ivana
    Vilagut, Roger
    Thiran, Jean-Philippe
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 161 - +
  • [32] Identification of discriminative features for decoding overt and imagined speech using stereotactic electroencephalography
    Meng, Kevin
    Grayden, David B.
    Cook, Mark J.
    Vogrin, Simon
    Goodarzy, Farhad
    [J]. 2021 9TH IEEE INTERNATIONAL WINTER CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI), 2021, : 105 - 110
  • [33] Lip-Reading Enables the Brain to Synthesize Auditory Features of Unknown Silent Speech
    Bourguignon, Mathieu
    Baart, Martijn
    Kapnoula, Efthymia C.
    Molinaro, Nicola
    [J]. JOURNAL OF NEUROSCIENCE, 2020, 40 (05): : 1053 - 1065
  • [34] Performance enhancement of speaker identification systems using speech encryption and cancelable features
    Soliman N.F.
    Mostfa Z.
    El-Samie F.E.A.
    Abdalla M.I.
    [J]. International Journal of Speech Technology, 2017, 20 (4) : 977 - 1004
  • [35] Replacing Speaker-independent Recognition Task with Speaker-dependent Task for Lip-reading Using First Order Motion Model
    Kodama, Michinari
    Saitoh, Takeshi
    [J]. THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
  • [36] Speaker identification of whispering speech: an investigation on selected timbrel features and KNN distance measures
    Sardar, V. M.
    Shirbahadurkar, S. D.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (03) : 545 - 553
  • [37] Analysis of lip geometric features for audio-visual speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Han, Z
    Chung, KC
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2004, 34 (04): : 564 - 570
  • [38] Speaker identification using features based on first order Bessel function expansion of speech
    Gopalan, K
    Anderson, TR
    Cupples, EJ
    [J]. 1997 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2: PACRIM 10 YEARS - 1987-1997, 1997, : 589 - 592
  • [39] Comparative Analysis on Different Cepstral Features for Speaker Identification Recognition
    Hanifa, R. M.
    Isa, K.
    Mohamad, S.
    [J]. 2020 18TH IEEE STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT (SCORED), 2020, : 487 - 492
  • [40] Robust analysis and weighting on MFCC components for speech recognition and speaker identification
    Zhou, Xi
    Fu, Yun
    Liu, Ming
    Hasegawa-Johnson, Mark
    Huang, Thomas S.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 188 - 191