A Deep Neural Network for Audio-Visual Person Recognition

被引：0

作者：

Alam, Mohammad Rafiqul ^{[1
]}

Bennamoun, Mohammed ^{[1
]}

Togneri, Roberto ^{[2
]}

Sohel, Ferdous ^{[1
]}

机构：

[1] Univ Western Australia, Sch Comp Sci & Software Engn, Crawley, WA 6009, Australia

[2] Univ Western Australia, Sch Elect Elect & Comp Engn, Crawley, WA 6009, Australia

来源：

2015 IEEE 7TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS 2015) | 2015年

关键词：

DIMENSIONALITY;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents applications of special types of deep neural networks (DNNs) for audio-visual biometrics. A common example is the DBN-DNN that uses the generative weights of deep belief networks (DBNs) to initialize the feature detecting layers of deterministic feed forward DNNs. In this paper, we propose the DBM-DNN that uses the generative weights of deep Boltzmann machines (DBMs) for initialization of DNNs. Then, a softmax layer is added on top and the DNNs are trained discriminatively. Our experimental results show that lower error rates can be achieved using the DBM-DNN compared to the support vector machine (SVM), linear regression-based classifier (LRC) and the DBN-DNN. Experiments were carried out on two publicly available audio-visual datasets: the VidTIMIT and MOBIO.

引用

页数：6

共 50 条

[31] Detecting Audio-Visual Synchrony Using Deep Neural Networks
Marcheret, Etienne
Potamianos, Gerasimos
Vopicka, Josef
Goel, Vaibhava
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 548 - 552
[32] Audio-Visual Speech Enhancement using Deep Neural Networks
Hou, Jen-Cheng
Wang, Syu-Siang
Lai, Ying-Hui
Lin, Jen-Chun
Tsao, Yu
Chang, Hsiu-Wen
Wang, Hsin-Min
[J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
[33] An audio-visual speech recognition system for testing new audio-visual databases
Pao, Tsang-Long
Liao, Wen-Yuan
[J]. VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
[34] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
Zhang, Zi-Qiang
Zhang, Jie
Zhang, Jian-Shu
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
[35] Multimodal Attentive Fusion Network for audio-visual event recognition
Brousmiche, Mathilde
Rouat, Jean
Dupont, Stephane
[J]. INFORMATION FUSION, 2022, 85 : 52 - 59
[36] Audio-Visual Action Recognition Using Transformer Fusion Network
Kim, Jun-Hwa
Won, Chee Sun
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (03):
[37] Audio-Visual Sensor Fusion Framework Using Person Attributes Robust to Missing Visual Modality for Person Recognition
John, Vijay
Kawanishi, Yasutomo
[J]. MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 523 - 535
[38] Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition
Song, Qiya
Sun, Bin
Li, Shutao
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10028 - 10038
[39] AUDIO-VISUAL PERSON RECOGNITION IN MULTIMEDIA DATA FROM THE IARPA JANUS PROGRAM
Sell, Gregory
Duh, Kevin
Snyder, David
Etter, Dave
Garcia-Romero, Daniel
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3031 - 3035
[40] A generative approach to audio-visual person tracking
Brunelli, Roberto
Brutti, Alessio
Chippendale, Paul
Lanz, Oswald
Omologo, Maurizio
Svaizer, Piergiorgio
Tobia, Francesco
[J]. MULTIMODAL TECHNOLOGIES FOR PERCEPTION OF HUMANS, 2007, 4122 : 55 - 68

← 1 2 3 4 5 →