A Deep Neural Network for Audio-Visual Person Recognition

被引：0

作者：

Alam, Mohammad Rafiqul ^{[1
]}

Bennamoun, Mohammed ^{[1
]}

Togneri, Roberto ^{[2
]}

Sohel, Ferdous ^{[1
]}

机构：

[1] Univ Western Australia, Sch Comp Sci & Software Engn, Crawley, WA 6009, Australia

[2] Univ Western Australia, Sch Elect Elect & Comp Engn, Crawley, WA 6009, Australia

来源：

2015 IEEE 7TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS 2015) | 2015年

关键词：

DIMENSIONALITY;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents applications of special types of deep neural networks (DNNs) for audio-visual biometrics. A common example is the DBN-DNN that uses the generative weights of deep belief networks (DBNs) to initialize the feature detecting layers of deterministic feed forward DNNs. In this paper, we propose the DBM-DNN that uses the generative weights of deep Boltzmann machines (DBMs) for initialization of DNNs. Then, a softmax layer is added on top and the DNNs are trained discriminatively. Our experimental results show that lower error rates can be achieved using the DBM-DNN compared to the support vector machine (SVM), linear regression-based classifier (LRC) and the DBN-DNN. Experiments were carried out on two publicly available audio-visual datasets: the VidTIMIT and MOBIO.

引用

页数：6

共 50 条

[41] Real time audio-visual person tracking
Talantzis, Fotios
Pnevmatikakis, Aristodemos
Polymenakos, Lazaros C.
[J]. 2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2006, : 243 - +
[42] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
Tamura, Satoshi
Ishikawa, Masato
Hashiba, Takashi
Takeuchi, Shin'ichi
Hayamizu, Satoru
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
[43] IMPROVING AUDIO-VISUAL SPEECH RECOGNITION USING DEEP NEURAL NETWORKS WITH DYNAMIC STREAM RELIABILITY ESTIMATES
Meutzner, Hendrik
Ma, Ning
Nickel, Robert
Schymura, Christopher
Kolossa, Dorothea
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5320 - 5324
[44] Audio-Visual Gated-Sequenced Neural Networks for Affect Recognition
Aspandi, Decky
Sukno, Federico
Schuller, Bjorn W.
Binefa, Xavier
[J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (03) : 2193 - 2208
[45] Leveraging recent advances in deep learning for audio-Visual emotion recognition
Schoneveld, Liam
Othmani, Alice
Abdelkawy, Hazem
[J]. PATTERN RECOGNITION LETTERS, 2021, 146 : 1 - 7
[46] Audio-visual spontaneous emotion recognition
Zeng, Zhihong
Hu, Yuxiao
Roisman, Glenn I.
Wen, Zhen
Fu, Yun
Huang, Thomas S.
[J]. ARTIFICIAL INTELLIGENCE FOR HUMAN COMPUTING, 2007, 4451 : 72 - +
[47] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
Estellers, Virginia
Thiran, Jean-Philippe
[J]. 19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
[48] Audio-visual speech recognition using red exclusion and neural networks
Lewis, TW
Powers, DMW
[J]. JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2003, 35 (01): : 41 - 64
[49] Audio-visual integration for speech recognition
Kober, R
Harz, U
[J]. NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
[50] Audio-visual affective expression recognition
Huang, Thomas S.
Zeng, Zhihong
[J]. MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788

← 1 2 3 4 5 →