Audiovisual Speaker Identification Based on Lip and Speech Modalities

被引:0
|
作者
Chelali, Fatma [1 ]
Djeradi, Amar [1 ]
机构
[1] Univ Sci & Technol Houari Boumedienne, Fac Elect Engn & Comp Sci, Algiers, Algeria
关键词
Audiovisual speaker recognition; DCT; DWT; PLP; MFCC; RECOGNITION; INFORMATION; FUSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article, we present a bimodal speaker identification method, which integrates both acoustic and visual features and where the two audiovisual stream modalities are processed in parallel. We also propose a fusion technique that combines the two modalities to make the final recognition decision. Experiments are conducted on an audiovisual dataset containing the 28 Arabic syllables pronounced by ten speakers. Results show the importance of the visual information that is provided by Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) in addition to the audio information corresponding to the Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Predictive (PLP). Furthermore, some artificial neural networks such as Multilayer Perceptron (MLP) and Radial Basis Function (RBF) were investigated and tested successfully in this dataset by presenting good recognition performances with serial concatenation for the acoustic and visual vectors.
引用
收藏
页码:99 / 110
页数:12
相关论文
共 50 条
  • [1] Robust speaker verification via fusion of speech and lip modalities
    Wark, T.
    Sridharan, S.
    Chandran, V.
    [J]. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 6 : 3061 - 3064
  • [2] Robust speaker verification via fusion of speech and lip modalities
    Wark, T
    Sridharan, S
    Chandran, V
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 3061 - 3064
  • [3] Speaker identification using speech and lip features
    Ou, GB
    Li, X
    Yao, XC
    Jia, HB
    Murphey, YL
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2565 - 2570
  • [4] Audiovisual Speaker Identity Verification Based on Lip Motion Features
    Chetty, Girija
    Wagner, Michael
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2604 - 2607
  • [5] The use of speech and lip modalities for robust speaker verification under adverse conditions
    Wark, TJ
    Sridharan, S
    Chandran, V
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 1, 1999, : 812 - 816
  • [6] Use of speech and lip modalities for robust speaker verification under adverse conditions
    Queensland Univ of Technology, Brisbane
    [J]. Int Conf Multimedia Comput Syst Proc, (812-816):
  • [7] Audiovisual-based adaptive speaker identification
    Li, Y
    Narayanan, S
    Kuo, CCJ
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 812 - 815
  • [8] Audiovisual-based adaptive speaker identification
    Li, Y
    Narayanan, S
    Kuo, CCJ
    [J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 565 - 568
  • [9] Adaptive fusion of speech and lip information for robust speaker identification
    Wark, T
    Sridharan, S
    [J]. DIGITAL SIGNAL PROCESSING, 2001, 11 (03) : 169 - 186
  • [10] Inter-speaker synchronization in audiovisual database for lip-readable speech to animation conversion
    Feldhoffer, Gergely
    Oroszi, Balazs
    Takacs, Gyoergy
    Tihanyi, Attila
    Bardi, Tamas
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 447 - 454