Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引：0

作者：

Chelali, Fatma ^{[1
]}

Djeradi, Amar ^{[1
]}

机构：

[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria

来源：

INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY | 2017年 / 14卷 / 01期

关键词：

Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.

引用

页码：99 / 110

页数：12

共 50 条

[1] Robust speaker verification via fusion of speech and lip modalities
Wark, T.
Sridharan, S.
Chandran, V.
[J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 6 : 3061 - 3064
[2] Robust speaker verification via fusion of speech and lip modalities
Wark, T
Sridharan, S
Chandran, V
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 3061 - 3064
[3] Speaker identification using speech and lip features
Ou, GB
Li, X
Yao, XC
Jia, HB
Murphey, YL
[J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2565 - 2570
[4] Audiovisual Speaker Identity Verification Based on Lip Motion Features
Chetty, Girija
Wagner, Michael
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2604 - 2607
[5] The use of speech and lip modalities for robust speaker verification under adverse conditions
Wark, TJ
Sridharan, S
Chandran, V
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 812 - 816
[6] Use of speech and lip modalities for robust speaker verification under adverse conditions
Queensland Univ of Technology, Brisbane
[J]. Int Conf Multimedia Comput Syst Proc, (812-816):
[7] Audiovisual-based adaptive speaker identification
Li, Y
Narayanan, S
Kuo, CCJ
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 812 - 815
[8] Audiovisual-based adaptive speaker identification
Li, Y
Narayanan, S
Kuo, CCJ
[J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 565 - 568
[9] Adaptive fusion of speech and lip information for robust speaker identification
Wark, T
Sridharan, S
[J]. DIGITAL SIGNAL PROCESSING, 2001, 11 (03) : 169 - 186
[10] Inter-speaker synchronization in audiovisual database for lip-readable speech to animation conversion
Feldhoffer, Gergely
Oroszi, Balazs
Takacs, Gyoergy
Tihanyi, Attila
Bardi, Tamas
[J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 447 - 454

← 1 2 3 4 5 →