Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras

被引：5

作者：

Kim, Hansung ^{[1
]}

Remaggi, Luca ^{[2
]}

Dourado, Aloisio ^{[3
]}

de Campos, Teofilo ^{[3
]}

Jackson, Philip J. B. ^{[4
]}

Hilton, Adrian ^{[4
]}

机构：

[1] Univ Southampton, ECS, Southampton, Hants, England

[2] Creat Labs UK, London, England

[3] Univ Brasilia, Brasilia, DF, Brazil

[4] Univ Surrey, CVSSP, Guildford, Surrey, England

来源：

VIRTUAL REALITY | 2022年 / 26卷 / 03期

基金：

英国工程与自然科学研究理事会;

关键词：

Audio-visual scene reproduction; Scene understanding; 3D reconstruction and completion; Spatial audio; VIRTUAL-REALITY; IMPLEMENTATION; PERCEPTION; FUTURE;

D O I：

10.1007/s10055-021-00594-3

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

As personalised immersive display systems have been intensely explored in virtual reality (VR), plausible 3D audio corresponding to the visual content is required to provide more realistic experiences to users. It is well known that spatial audio synchronised with visual information improves a sense of immersion but limited research progress has been achieved in immersive audio-visual content production and reproduction. In this paper, we propose an end-to-end pipeline to simultaneously reconstruct 3D geometry and acoustic properties of the environment from a pair of omnidirectional panoramic images. A semantic scene reconstruction and completion method using a deep convolutional neural network is proposed to estimate the complete semantic scene geometry in order to adapt spatial audio reproduction to the scene. Experiments provide objective and subjective evaluations of the proposed pipeline for plausible audio-visual VR reproduction of real scenes.

引用

页码：823 / 838

页数：16

共 50 条

[41] Hierarchical multimodal attention for end -to -end audio-visual scene -aware dialogue response generation
Le, Hung
Sahoo, Doyen
Chen, Nancy F.
Hoi, Steven C. H.
COMPUTER SPEECH AND LANGUAGE, 2020, 63
[42] Understanding Game Actions: The Development of a Post-Processing Method for Audio-Visual Scene Analysis
Schott, Gareth
Marczak, Raphael
PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 521 - 527
[43] Efficient, compelling and immersive VR audio experience using Scene Based Audio/Higher Order Ambisonics
Shivappa, Shankar
Morrell, Martin
Sen, Deep
Peters, Nils
Salehin, S. M. Akramus
2016 AES INTERNATIONAL CONFERENCE ON AUDIO FOR VIRTUAL AND AUGMENTED REALITY, 2016,
[44] Deformable Geometry based Semantic Reconstruction from Scene Graphs
Wang, Zhiming
Li, Yuxiao
Huang, Danlan
Luo, Yantian
Ge, Ning
Lu, Jianhua
2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
[45] VASD: Video Action Scene Detection using Audio Visual Data
Lili, N. A.
PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT, VOL 2, 2009, : 303 - 307
[46] Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification
Irie, Go
Satou, Takashi
Kojima, Akira
Yamasaki, Toshihiko
Aizawa, Kiyoharu
IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (06) : 523 - 535
[47] Modulation of scene consistency and task demand on language-driven eye movements for audio-visual integration
Yu, Wan-Yun
Tsai, Jie-Li
ACTA PSYCHOLOGICA, 2016, 171 : 1 - 16
[48] Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions
Baier, Stephan
Ma, Yunpu
Tresp, Volker
SEMANTIC WEB - ISWC 2017, PT I, 2017, 10587 : 53 - 68
[49] Semantic Scene Completion from a Single 360-Degree Image and Depth Map
Dourado, Aloisio
Kim, Hansung
de Campos, Teofilo E.
Hilton, Adrian
PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 36 - 46
[50] Learning semantic scene models from observing activity in visual surveillance
Makris, D
Ellis, T
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (03): : 397 - 408

← 1 2 3 4 5 →