The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMM's

被引:0
|
作者
Wark, T [1 ]
Sridharan, S [1 ]
Chandran, V [1 ]
机构
[1] Queensland Univ Technol, Sch Elect Elect & Syst Engn, RCSAVT, Speech Res Lab, Brisbane, Qld 4001, Australia
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates the use of temporal lip information, in conjunction with speech information, for robust, text-dependent speaker identification. We propose that significant speaker-dependent information can be obtained from moving lips, enabling speaker recognition systems to be highly robust in the presence of noise. The fusion structure for the audio and visual information is based around the use of multi-stream hidden Markov models (MSHMM), with audio and visual features forming the two independent data streams. Recent work with multi-modal MSHMM's has been performed successfully for the task of speech recognition. The use of temporal lip information for speaker identification has been performed previously, however this has been restricted to output fusion via single-stream HMM's. We present an extension to this previous work, and show that a MSHMM is a valid structure for multi-modal speaker identification.
引用
收藏
页码:2389 / 2392
页数:4
相关论文
共 50 条
  • [21] Evaluation of a noise-robust multi-stream speaker verification method using F0 information
    Asami, Taichi
    Iwano, Koji
    Furui, Sadaoki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 549 - 557
  • [22] Multi-modal Fake News Detection on Social Media via Multi-grained Information Fusion
    Zhou, Yangming
    Yang, Yuzhou
    Ying, Qichao
    Qian, Zhenxing
    Zhang, Xinpeng
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 343 - 352
  • [23] Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs
    Wu, Tianyu
    Tang, Yang
    Sun, Qiyu
    Xiong, Luolin
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 3044 - 3055
  • [24] Application of QR code as a watermark in multi-modal person's identification
    Velickovic, Zoran S.
    Velickovic, Sladana M.
    Velickovic, Marko Z.
    [J]. 2022 57TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION, COMMUNICATION AND ENERGY SYSTEMS AND TECHNOLOGIES (ICEST), 2022, : 135 - 138
  • [25] Multi-Stream Gated and Pyramidal Temporal Convolutional Neural Networks for Audio-Visual Speech Separation in Multi-Talker Environments
    Luo, Yiyu
    Wang, Jing
    Xu, Liang
    Yang, Lidong
    [J]. INTERSPEECH 2021, 2021, : 1104 - 1108
  • [26] Multi-modal Gait Recognition via Effective Spatial-Temporal Feature Fusion
    Cui, Yufeng
    Kang, Yimei
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17949 - 17957
  • [27] Early Crop Classification via Multi-Modal Satellite Data Fusion and Temporal Attention
    Weilandt, Frank
    Behling, Robert
    Goncalves, Romulo
    Madadi, Arash
    Richter, Lorenz
    Sanona, Tiago
    Spengler, Daniel
    Welsch, Jona
    [J]. REMOTE SENSING, 2023, 15 (03)
  • [28] Early identification of stroke through deep learning with multi-modal human speech and movement data
    Zijun Ou
    Haitao Wang
    Bin Zhang
    Haobang Liang
    Bei Hu
    Longlong Ren
    Yanjuan Liu
    Yuhu Zhang
    Chengbo Dai
    Hejun Wu
    Weifeng Li
    Xin Li
    [J]. Neural Regeneration Research, 2025, 20 (01) : 234 - 241
  • [29] Related recognition of one's body and consistency of multi-modal sensory information
    Fujimoto, Yuriko
    Murohashi, Harumitsu
    [J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2016, 51 : 1156 - 1156
  • [30] Multi-modal semantics fusion model for domain relation extraction via information bottleneck
    Tian, Zhao
    Zhao, Xuan
    Li, Xiwang
    Ma, Xiaoping
    Li, Yinghao
    Wang, Youwei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244