Discriminative analysis of lip motion features for speaker identification and speech-reading

被引：70

作者：

Cetinguel, H. Ertan ^{[1
]}

Yemez, Yuecel ^{[1
]}

Erzin, Engin ^{[1
]}

Tekalp, A. Murat ^{[1
]}

机构：

[1] Koc Univ, Coll Engn, Multimedia Vis & Graph Lab, TR-34450 Istanbul, Turkey

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2006年 / 15卷 / 10期

关键词：

Bayesian discriminative feature selection; lip motion; speaker identification; speech recognition; temporal discriminative feature selection;

D O I：

10.1109/TIP.2006.877528

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.

引用

页码：2879 / 2891

页数：13

共 50 条

[1] Discriminative lip-motion features for biometric speaker identification
Cetingül, HE
Yemez, Y
Erzin, E
Tekalp, AM
[J]. ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2023 - 2026
[2] Speaker identification using speech and lip features
Ou, GB
Li, X
Yao, XC
Jia, HB
Murphey, YL
[J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2565 - 2570
[3] Robust lip-motion features for speaker identification
Çetingül, HE
Yemez, Y
Erzin, E
Tekalp, AM
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 509 - 512
[4] On optimal selection of lip-motion features for speaker identification
Çetingül, HE
Erzin, E
Yemez, Y
Tekalp, AM
[J]. 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 7 - 10
[5] Synergy of lip-motion and acoustic features in biometric speech and speaker recognition
Faraj, Maycel-Isaac
Bigun, Josef
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2007, 56 (09) : 1169 - 1175
[6] AUDIO-VISUAL SPEECH ENHANCEMENT METHOD CONDITIONED ON THE LIP MOTION AND SPEAKER-DISCRIMINATIVE EMBEDDINGS
Ito, Koichiro
Yamamoto, Masaaki
Nagamatsu, Kenji
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6668 - 6672
[7] Learning Discriminative Features for Speaker Identification and Verification
Yadav, Sarthak
Rai, Atul
[J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2237 - 2241
[8] Speaker identification using orthogonal and discriminative features
Davarpanah, SH
Mirzaei, A
Ziaei, A
[J]. IWSSIP 2005: Proceedings of the 12th International Worshop on Systems, Signals & Image Processing, 2005, : 293 - 296
[9] The use of lip motion for biometric speaker identification
Çetingül, HE
Yemez, Y
Erzin, E
Tekalp, AM
[J]. PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, : 148 - 151
[10] Repetition priming for face speech images: Speech-reading primes face identification
Campbell, R
De Haan, EHF
[J]. BRITISH JOURNAL OF PSYCHOLOGY, 1998, 89 : 309 - 323

← 1 2 3 4 5 →