The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMM's

被引：0

作者：

Wark, T ^{[1
]}

Sridharan, S ^{[1
]}

Chandran, V ^{[1
]}

机构：

[1] Queensland Univ Technol, Sch Elect Elect & Syst Engn, RCSAVT, Speech Res Lab, Brisbane, Qld 4001, Australia

来源：

2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI | 2000年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming the two independent data streams. Recent work with multi-modal MSHMM's has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously, however this has been restricted to output fusion via single-stream HMM's. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification.

引用

页码：2389 / 2392

页数：4

共 50 条

[1] Multi-Modal Multi-Stream UNET Model for Liver Segmentation
Elghazy, Hagar Louye
Fakhr, Mohamed Waleed
[J]. 2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 28 - 33
[2] Automatic extraction of geometric lip features with application to multi-modal speaker identification
Arsic, Ivana
Vilagut, Roger
Thiran, Jean-Philippe
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 161 - +
[3] Multi-stream HMM for EMG-based speech recognition
Manabe, H
Zhang, Z
[J]. PROCEEDINGS OF THE 26TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2004, 26 : 4389 - 4392
[4] Rapid feature space speaker adaptation for multi-stream HMM-based audio-visual speech recognition
Huang, J
Marcheret, E
Visweswariah, K
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 338 - 341
[5] Multi-Stream Spectro-Temporal Features for Robust Speech Recognition
Zhao, Sherry Y.
Morgan, Nelson
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 898 - 901
[6] Why do multi-stream, multi-band and multi-modal approaches work on biometric user authentication tasks?
Poh, N
Bengio, S
[J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 893 - 896
[7] Masking the feature information in multi-stream speech-analogue displays
Divenyi, PL
[J]. SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 269 - 281
[8] Fused HMM-Adaptation of Multi-Stream HMMs for Audio-Visual Speech Recognition
Dean, David
Lucey, Patrick
Sridharan, Sridha
Wark, Tim
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2272 - 2275
[9] An Experimental Analysis on Integrating Multi-Stream Spectro-Temporal, Cepstral and Pitch Information for Mandarin Speech Recognition
Wang, Yow-Bang
Li, Shang-Wen
Lee, Lin-shan
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (10): : 2006 - 2014
[10] Combined Discriminative Training for Multi-Stream HMM-based Audio-Visual Speech Recognition
Huang, Jing
Visweswariah, Karthik
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1399 - +

← 1 2 3 4 5 →