A Deep Neural Network for Audio-Visual Person Recognition

被引:0
|
作者
Alam, Mohammad Rafiqul [1 ]
Bennamoun, Mohammed [1 ]
Togneri, Roberto [2 ]
Sohel, Ferdous [1 ]
机构
[1] Univ Western Australia, Sch Comp Sci & Software Engn, Crawley, WA 6009, Australia
[2] Univ Western Australia, Sch Elect Elect & Comp Engn, Crawley, WA 6009, Australia
关键词
DIMENSIONALITY;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents applications of special types of deep neural networks (DNNs) for audio-visual biometrics. A common example is the DBN-DNN that uses the generative weights of deep belief networks (DBNs) to initialize the feature detecting layers of deterministic feed forward DNNs. In this paper, we propose the DBM-DNN that uses the generative weights of deep Boltzmann machines (DBMs) for initialization of DNNs. Then, a softmax layer is added on top and the DNNs are trained discriminatively. Our experimental results show that lower error rates can be achieved using the DBM-DNN compared to the support vector machine (SVM), linear regression-based classifier (LRC) and the DBN-DNN. Experiments were carried out on two publicly available audio-visual datasets: the VidTIMIT and MOBIO.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Audio-visual feature fusion via deep neural networks for automatic speech recognition
    Rahmani, Mohammad Hasan
    Almasganj, Farshad
    Seyyedsalehi, Seyyed Ali
    [J]. DIGITAL SIGNAL PROCESSING, 2018, 82 : 54 - 63
  • [22] Audio-Visual Glance Network for Efficient Video Recognition
    Nugroho, Muhammad Adi
    Woo, Sangmin
    Lee, Sumin
    Kim, Changick
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10116 - 10125
  • [23] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [24] Audio-visual affect recognition
    Zeng, Zhihong
    Tu, Jilin
    Liu, Ming
    Huang, Thomas S.
    Pianfetti, Brian
    Roth, Dan
    Levinson, Stephen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 424 - 428
  • [25] Audio-visual gender recognition
    Liu, Ming
    Xu, Xun
    Huang, Thomas S.
    [J]. MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788
  • [26] Scope for Deep Learning:A Study in Audio-Visual Speech Recognition
    Bhaskar, Shabina
    Thasleema, T. M.
    [J]. PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND KNOWLEDGE ECONOMY (ICCIKE' 2019), 2019, : 72 - 77
  • [27] Integration of Deep Bottleneck Features for Audio-Visual Speech Recognition
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Tamura, Satoshi
    Iribe, Yurie
    Takeda, Kazuya
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 563 - 567
  • [28] AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    [J]. 2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [29] AUDIO-VISUAL DEEP LEARNING FOR NOISE ROBUST SPEECH RECOGNITION
    Huang, Jing
    Kingsbury, Brian
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7596 - 7599
  • [30] A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation
    Nayak, Shravan
    Schuler, Christian
    Saha, Debjoy
    Baumann, Timo
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 642 - 647