The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMM's

被引:0
|
作者
Wark, T [1 ]
Sridharan, S [1 ]
Chandran, V [1 ]
机构
[1] Queensland Univ Technol, Sch Elect Elect & Syst Engn, RCSAVT, Speech Res Lab, Brisbane, Qld 4001, Australia
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming the two independent data streams. Recent work with multi-modal MSHMM's has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously, however this has been restricted to output fusion via single-stream HMM's. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification.
引用
收藏
页码:2389 / 2392
页数:4
相关论文
共 50 条
  • [1] Multi-Modal Multi-Stream UNET Model for Liver Segmentation
    Elghazy, Hagar Louye
    Fakhr, Mohamed Waleed
    [J]. 2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 28 - 33
  • [2] Automatic extraction of geometric lip features with application to multi-modal speaker identification
    Arsic, Ivana
    Vilagut, Roger
    Thiran, Jean-Philippe
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 161 - +
  • [3] Multi-stream HMM for EMG-based speech recognition
    Manabe, H
    Zhang, Z
    [J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
  • [4] Rapid feature space speaker adaptation for multi-stream HMM-based audio-visual speech recognition
    Huang, J
    Marcheret, E
    Visweswariah, K
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 338 - 341
  • [5] Multi-Stream Spectro-Temporal Features for Robust Speech Recognition
    Zhao, Sherry Y.
    Morgan, Nelson
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 898 - 901
  • [6] Why do multi-stream, multi-band and multi-modal approaches work on biometric user authentication tasks?
    Poh, N
    Bengio, S
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 893 - 896
  • [7] Masking the feature information in multi-stream speech-analogue displays
    Divenyi, PL
    [J]. SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 269 - 281
  • [8] Fused HMM-Adaptation of Multi-Stream HMMs for Audio-Visual Speech Recognition
    Dean, David
    Lucey, Patrick
    Sridharan, Sridha
    Wark, Tim
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2272 - 2275
  • [9] An Experimental Analysis on Integrating Multi-Stream Spectro-Temporal, Cepstral and Pitch Information for Mandarin Speech Recognition
    Wang, Yow-Bang
    Li, Shang-Wen
    Lee, Lin-shan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2006 - 2014
  • [10] Combined Discriminative Training for Multi-Stream HMM-based Audio-Visual Speech Recognition
    Huang, Jing
    Visweswariah, Karthik
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1399 - +