Discriminative analysis of lip motion features for speaker identification and speech-reading

被引:70
|
作者
Cetinguel, H. Ertan [1 ]
Yemez, Yuecel [1 ]
Erzin, Engin [1 ]
Tekalp, A. Murat [1 ]
机构
[1] Koc Univ, Coll Engn, Multimedia Vis & Graph Lab, TR-34450 Istanbul, Turkey
关键词
Bayesian discriminative feature selection; lip motion; speaker identification; speech recognition; temporal discriminative feature selection;
D O I
10.1109/TIP.2006.877528
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
引用
收藏
页码:2879 / 2891
页数:13
相关论文
共 50 条
  • [1] Discriminative lip-motion features for biometric speaker identification
    Cetingül, HE
    Yemez, Y
    Erzin, E
    Tekalp, AM
    [J]. ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2023 - 2026
  • [2] Speaker identification using speech and lip features
    Ou, GB
    Li, X
    Yao, XC
    Jia, HB
    Murphey, YL
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2565 - 2570
  • [3] Robust lip-motion features for speaker identification
    Çetingül, HE
    Yemez, Y
    Erzin, E
    Tekalp, AM
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 509 - 512
  • [4] On optimal selection of lip-motion features for speaker identification
    Çetingül, HE
    Erzin, E
    Yemez, Y
    Tekalp, AM
    [J]. 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 7 - 10
  • [5] Synergy of lip-motion and acoustic features in biometric speech and speaker recognition
    Faraj, Maycel-Isaac
    Bigun, Josef
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1169 - 1175
  • [6] AUDIO-VISUAL SPEECH ENHANCEMENT METHOD CONDITIONED ON THE LIP MOTION AND SPEAKER-DISCRIMINATIVE EMBEDDINGS
    Ito, Koichiro
    Yamamoto, Masaaki
    Nagamatsu, Kenji
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6668 - 6672
  • [7] Learning Discriminative Features for Speaker Identification and Verification
    Yadav, Sarthak
    Rai, Atul
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2237 - 2241
  • [8] Speaker identification using orthogonal and discriminative features
    Davarpanah, SH
    Mirzaei, A
    Ziaei, A
    [J]. IWSSIP 2005: Proceedings of the 12th International Worshop on Systems, Signals & Image Processing, 2005, : 293 - 296
  • [9] The use of lip motion for biometric speaker identification
    Çetingül, HE
    Yemez, Y
    Erzin, E
    Tekalp, AM
    [J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 148 - 151
  • [10] Repetition priming for face speech images: Speech-reading primes face identification
    Campbell, R
    De Haan, EHF
    [J]. BRITISH JOURNAL OF PSYCHOLOGY, 1998, 89 : 309 - 323