A Deep Neural Network for Audio-Visual Person Recognition

被引:0
|
作者
Alam, Mohammad Rafiqul [1 ]
Bennamoun, Mohammed [1 ]
Togneri, Roberto [2 ]
Sohel, Ferdous [1 ]
机构
[1] Univ Western Australia, Sch Comp Sci & Software Engn, Crawley, WA 6009, Australia
[2] Univ Western Australia, Sch Elect Elect & Comp Engn, Crawley, WA 6009, Australia
关键词
DIMENSIONALITY;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents applications of special types of deep neural networks (DNNs) for audio-visual biometrics. A common example is the DBN-DNN that uses the generative weights of deep belief networks (DBNs) to initialize the feature detecting layers of deterministic feed forward DNNs. In this paper, we propose the DBM-DNN that uses the generative weights of deep Boltzmann machines (DBMs) for initialization of DNNs. Then, a softmax layer is added on top and the DNNs are trained discriminatively. Our experimental results show that lower error rates can be achieved using the DBM-DNN compared to the support vector machine (SVM), linear regression-based classifier (LRC) and the DBN-DNN. Experiments were carried out on two publicly available audio-visual datasets: the VidTIMIT and MOBIO.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Detecting Audio-Visual Synchrony Using Deep Neural Networks
    Marcheret, Etienne
    Potamianos, Gerasimos
    Vopicka, Josef
    Goel, Vaibhava
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 548 - 552
  • [32] Audio-Visual Speech Enhancement using Deep Neural Networks
    Hou, Jen-Cheng
    Wang, Syu-Siang
    Lai, Ying-Hui
    Lin, Jen-Chun
    Tsao, Yu
    Chang, Hsiu-Wen
    Wang, Hsin-Min
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [33] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    [J]. VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [34] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [35] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    [J]. INFORMATION FUSION, 2022, 85 : 52 - 59
  • [36] Audio-Visual Action Recognition Using Transformer Fusion Network
    Kim, Jun-Hwa
    Won, Chee Sun
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [37] Audio-Visual Sensor Fusion Framework Using Person Attributes Robust to Missing Visual Modality for Person Recognition
    John, Vijay
    Kawanishi, Yasutomo
    [J]. MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 523 - 535
  • [38] Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition
    Song, Qiya
    Sun, Bin
    Li, Shutao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10028 - 10038
  • [39] AUDIO-VISUAL PERSON RECOGNITION IN MULTIMEDIA DATA FROM THE IARPA JANUS PROGRAM
    Sell, Gregory
    Duh, Kevin
    Snyder, David
    Etter, Dave
    Garcia-Romero, Daniel
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3031 - 3035
  • [40] A generative approach to audio-visual person tracking
    Brunelli, Roberto
    Brutti, Alessio
    Chippendale, Paul
    Lanz, Oswald
    Omologo, Maurizio
    Svaizer, Piergiorgio
    Tobia, Francesco
    [J]. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 55 - 68