Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras

被引:5
|
作者
Kim, Hansung [1 ]
Remaggi, Luca [2 ]
Dourado, Aloisio [3 ]
de Campos, Teofilo [3 ]
Jackson, Philip J. B. [4 ]
Hilton, Adrian [4 ]
机构
[1] Univ Southampton, ECS, Southampton, Hants, England
[2] Creat Labs UK, London, England
[3] Univ Brasilia, Brasilia, DF, Brazil
[4] Univ Surrey, CVSSP, Guildford, Surrey, England
基金
英国工程与自然科学研究理事会;
关键词
Audio-visual scene reproduction; Scene understanding; 3D reconstruction and completion; Spatial audio; VIRTUAL-REALITY; IMPLEMENTATION; PERCEPTION; FUTURE;
D O I
10.1007/s10055-021-00594-3
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As personalised immersive display systems have been intensely explored in virtual reality (VR), plausible 3D audio corresponding to the visual content is required to provide more realistic experiences to users. It is well known that spatial audio synchronised with visual information improves a sense of immersion but limited research progress has been achieved in immersive audio-visual content production and reproduction. In this paper, we propose an end-to-end pipeline to simultaneously reconstruct 3D geometry and acoustic properties of the environment from a pair of omnidirectional panoramic images. A semantic scene reconstruction and completion method using a deep convolutional neural network is proposed to estimate the complete semantic scene geometry in order to adapt spatial audio reproduction to the scene. Experiments provide objective and subjective evaluations of the proposed pipeline for plausible audio-visual VR reproduction of real scenes.
引用
收藏
页码:823 / 838
页数:16
相关论文
共 50 条
  • [41] Hierarchical multimodal attention for end -to -end audio-visual scene -aware dialogue response generation
    Le, Hung
    Sahoo, Doyen
    Chen, Nancy F.
    Hoi, Steven C. H.
    COMPUTER SPEECH AND LANGUAGE, 2020, 63
  • [42] Understanding Game Actions: The Development of a Post-Processing Method for Audio-Visual Scene Analysis
    Schott, Gareth
    Marczak, Raphael
    PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 521 - 527
  • [43] Efficient, compelling and immersive VR audio experience using Scene Based Audio/Higher Order Ambisonics
    Shivappa, Shankar
    Morrell, Martin
    Sen, Deep
    Peters, Nils
    Salehin, S. M. Akramus
    2016 AES INTERNATIONAL CONFERENCE ON AUDIO FOR VIRTUAL AND AUGMENTED REALITY, 2016,
  • [44] Deformable Geometry based Semantic Reconstruction from Scene Graphs
    Wang, Zhiming
    Li, Yuxiao
    Huang, Danlan
    Luo, Yantian
    Ge, Ning
    Lu, Jianhua
    2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,
  • [45] VASD: Video Action Scene Detection using Audio Visual Data
    Lili, N. A.
    PROCEEDINGS OF THE 2009 INTERNATIONAL CONFERENCE ON COMPUTER TECHNOLOGY AND DEVELOPMENT, VOL 2, 2009, : 303 - 307
  • [46] Affective Audio-Visual Words and Latent Topic Driving Model for Realizing Movie Affective Scene Classification
    Irie, Go
    Satou, Takashi
    Kojima, Akira
    Yamasaki, Toshihiko
    Aizawa, Kiyoharu
    IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (06) : 523 - 535
  • [47] Modulation of scene consistency and task demand on language-driven eye movements for audio-visual integration
    Yu, Wan-Yun
    Tsai, Jie-Li
    ACTA PSYCHOLOGICA, 2016, 171 : 1 - 16
  • [48] Improving Visual Relationship Detection Using Semantic Modeling of Scene Descriptions
    Baier, Stephan
    Ma, Yunpu
    Tresp, Volker
    SEMANTIC WEB - ISWC 2017, PT I, 2017, 10587 : 53 - 68
  • [49] Semantic Scene Completion from a Single 360-Degree Image and Depth Map
    Dourado, Aloisio
    Kim, Hansung
    de Campos, Teofilo E.
    Hilton, Adrian
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 36 - 46
  • [50] Learning semantic scene models from observing activity in visual surveillance
    Makris, D
    Ellis, T
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (03): : 397 - 408