A Deep Neural Network for Audio-Visual Person Recognition

被引:0
|
作者
Alam, Mohammad Rafiqul [1 ]
Bennamoun, Mohammed [1 ]
Togneri, Roberto [2 ]
Sohel, Ferdous [1 ]
机构
[1] Univ Western Australia, Sch Comp Sci & Software Engn, Crawley, WA 6009, Australia
[2] Univ Western Australia, Sch Elect Elect & Comp Engn, Crawley, WA 6009, Australia
关键词
DIMENSIONALITY;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents applications of special types of deep neural networks (DNNs) for audio-visual biometrics. A common example is the DBN-DNN that uses the generative weights of deep belief networks (DBNs) to initialize the feature detecting layers of deterministic feed forward DNNs. In this paper, we propose the DBM-DNN that uses the generative weights of deep Boltzmann machines (DBMs) for initialization of DNNs. Then, a softmax layer is added on top and the DNNs are trained discriminatively. Our experimental results show that lower error rates can be achieved using the DBM-DNN compared to the support vector machine (SVM), linear regression-based classifier (LRC) and the DBN-DNN. Experiments were carried out on two publicly available audio-visual datasets: the VidTIMIT and MOBIO.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Audio-Visual Deep Neural Network for Robust Person Verification
    Qian, Yanmin
    Chen, Zhengyang
    Wang, Shuai
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1079 - 1092
  • [2] Multimodal Deep Convolutional Neural Network for Audio-Visual Emotion Recognition
    Zhang, Shiqing
    Zhang, Shiliang
    Huang, Tiejun
    Gao, Wen
    [J]. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 281 - 284
  • [3] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    [J]. JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974
  • [4] RECURRENT NEURAL NETWORK TRANSDUCER FOR AUDIO-VISUAL SPEECH RECOGNITION
    Makino, Takaki
    Liao, Hank
    Assael, Yannis
    Shillingford, Brendan
    Garcia, Basilio
    Braga, Otavio
    Siohan, Olivier
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 905 - 912
  • [5] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [6] DEEP AUDIO-VISUAL FUSION NEURAL NETWORK FOR SALIENCY ESTIMATION
    Yao, Shunyu
    Min, Xiongkuo
    Zhai, Guangtao
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1604 - 1608
  • [7] Multi-Feature Audio-Visual Person Recognition
    Das, Amitav
    Manyam, Ohil K.
    Tapaswi, Makarand
    [J]. 2008 IEEE WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2008, : 227 - 232
  • [8] Dynamic Audio-Visual Biometric Fusion for Person Recognition
    Alsaedi, Najlaa Hindi
    Jaha, Emad Sami
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1283 - 1311
  • [9] Audio-Visual Speech Recognition System Using Recurrent Neural Network
    Goh, Yeh-Huann
    Lau, Kai-Xian
    Lee, Yoon-Ket
    [J]. PROCEEDINGS OF THE 2019 4TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (INCIT): ENCOMPASSING INTELLIGENT TECHNOLOGY AND INNOVATION TOWARDS THE NEW ERA OF HUMAN LIFE, 2019, : 38 - 43
  • [10] A Neural Network Architecture for Children's Audio-Visual Emotion Recognition
    Matveev, Anton
    Matveev, Yuri
    Frolova, Olga
    Nikolaev, Aleksandr
    Lyakso, Elena
    [J]. MATHEMATICS, 2023, 11 (22)