A Deep Neural Network for Audio-Visual Person Recognition

被引:0
|
作者
Alam, Mohammad Rafiqul [1 ]
Bennamoun, Mohammed [1 ]
Togneri, Roberto [2 ]
Sohel, Ferdous [1 ]
机构
[1] Univ Western Australia, Sch Comp Sci & Software Engn, Crawley, WA 6009, Australia
[2] Univ Western Australia, Sch Elect Elect & Comp Engn, Crawley, WA 6009, Australia
关键词
DIMENSIONALITY;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents applications of special types of deep neural networks (DNNs) for audio-visual biometrics. A common example is the DBN-DNN that uses the generative weights of deep belief networks (DBNs) to initialize the feature detecting layers of deterministic feed forward DNNs. In this paper, we propose the DBM-DNN that uses the generative weights of deep Boltzmann machines (DBMs) for initialization of DNNs. Then, a softmax layer is added on top and the DNNs are trained discriminatively. Our experimental results show that lower error rates can be achieved using the DBM-DNN compared to the support vector machine (SVM), linear regression-based classifier (LRC) and the DBN-DNN. Experiments were carried out on two publicly available audio-visual datasets: the VidTIMIT and MOBIO.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Real time audio-visual person tracking
    Talantzis, Fotios
    Pnevmatikakis, Aristodemos
    Polymenakos, Lazaros C.
    [J]. 2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2006, : 243 - +
  • [42] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [43] IMPROVING AUDIO-VISUAL SPEECH RECOGNITION USING DEEP NEURAL NETWORKS WITH DYNAMIC STREAM RELIABILITY ESTIMATES
    Meutzner, Hendrik
    Ma, Ning
    Nickel, Robert
    Schymura, Christopher
    Kolossa, Dorothea
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5320 - 5324
  • [44] Audio-Visual Gated-Sequenced Neural Networks for Affect Recognition
    Aspandi, Decky
    Sukno, Federico
    Schuller, Bjorn W.
    Binefa, Xavier
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2193 - 2208
  • [45] Leveraging recent advances in deep learning for audio-Visual emotion recognition
    Schoneveld, Liam
    Othmani, Alice
    Abdelkawy, Hazem
    [J]. PATTERN RECOGNITION LETTERS, 2021, 146 : 1 - 7
  • [46] Audio-visual spontaneous emotion recognition
    Zeng, Zhihong
    Hu, Yuxiao
    Roisman, Glenn I.
    Wen, Zhen
    Fu, Yun
    Huang, Thomas S.
    [J]. ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
  • [47] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    [J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [48] Audio-visual speech recognition using red exclusion and neural networks
    Lewis, TW
    Powers, DMW
    [J]. JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2003, 35 (01): : 41 - 64
  • [49] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    [J]. NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [50] Audio-visual affective expression recognition
    Huang, Thomas S.
    Zeng, Zhihong
    [J]. MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788