Dynamic visual features for audio-visual speaker verification

被引:10
|
作者
Dean, David [1 ]
Sridharan, Sridha [1 ]
机构
[1] Queensland Univ Technol, Speech Audio Image & Video Res Lab, Brisbane, Qld 4001, Australia
来源
COMPUTER SPEECH AND LANGUAGE | 2010年 / 24卷 / 02期
基金
澳大利亚研究理事会;
关键词
Audio-visual speaker recognition; Cascading appearance-based features; Synchronous hidden Markov models; LIP-MOTION; RECOGNITION;
D O I
10.1016/j.csl.2009.03.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:136 / 149
页数:14
相关论文
共 50 条
  • [1] Audio-visual speaker identification based on the use of dynamic audio and visual features
    Fox, N
    Reilly, RB
    [J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 743 - 751
  • [2] Dynamic Bayesian Networks for audio-visual speaker recognition
    Li, DD
    Yang, YC
    Wu, ZH
    [J]. ADVANCES IN BIOMETRICS, PROCEEDINGS, 2006, 3832 : 539 - 545
  • [3] Dynamic dependency tests for audio-visual speaker association
    Siracusa, Michael R.
    Fisher, John W., III
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 457 - +
  • [4] A MULTI-VIEW APPROACH TO AUDIO-VISUAL SPEAKER VERIFICATION
    Sari, Leda
    Singh, Kritika
    Zhou, Jiatong
    Torresani, Lorenzo
    Singhal, Nayan
    Saraf, Yatharth
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6194 - 6198
  • [5] Audio-Visual Speaker Verification via Joint Cross-Attention
    Rajasekhar, Gnana Praveen
    Alam, Jahangir
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 18 - 31
  • [6] Audio-Visual Synchronisation for Speaker Diarisation
    Garau, Giulia
    Dielmann, Alfred
    Bourlard, Herve
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2662 - +
  • [7] A Visual Signal Reliability for Robust Audio-Visual Speaker Identification
    Tariquzzaman, Md.
    Kim, Jin Young
    Na, Seung You
    Kim, Hyoung-Gook
    Har, Dongsoo
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (10): : 2052 - 2055
  • [8] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
    Ivanko, Denis
    Ryumin, Dmitry
    Axyonov, Alexandr
    Kashevnik, Alexey
    Karpov, Alexey
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
  • [9] Audio-Visual Kinship Verification in the Wild
    Wu, Xiaoting
    Granger, Eric
    Kinnunen, Tomi
    Feng, Xiaoyi
    Hadid, Abdenour
    [J]. 2019 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2019,
  • [10] DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization
    Wuerkaixi, Abudukelimu
    Yan, Kunda
    Zhang, You
    Duan, Zhiyao
    Zhang, Changshui
    [J]. 2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,