Efficient, compelling and immersive VR audio experience using Scene Based Audio/Higher Order Ambisonics

被引:0
|
作者
Shivappa, Shankar [1 ]
Morrell, Martin [1 ]
Sen, Deep [1 ]
Peters, Nils [1 ]
Salehin, S. M. Akramus [1 ]
机构
[1] Qualcomm Technol Inc QTI, San Diego, CA 92121 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
For a fully immersive and compelling VR experience, the acoustic-illusion of being 'present' in the virtual world must be created. To achieve this illusion two aspects are compulsory: (1) authentic spatial audio production and (2) the need to track and adapt the audio scene to the listener's head position and orientation. This paper shows how Scene-based audio (SBA), often synonymous with Higher Order Ambisonics (HOA), is ideal for VR because its ease of acoustic capture, offline content creation, post-production, transmission and interactive rendering. Compared to object-based audio, the rendering complexity is much lower for SBA. Also, SBA can offer higher and more coherent spatial fidelity when compared to channel based audio. One of the advantages of SBA is flexible rendering, which means that the same audio stream can be rendered to various speaker formats including binaural rendering for headphone consumption. The paper discusses the need for efficient SBA compression for VR content delivery, and presents MPEG-H as an efficient and versatile delivery system for SBA. For a personalized VR experience, accurate binaural rendering is essential. SBA can be efficiently binauralized. Its number of convolutions is proportional to the number of HOA coefficients, rather than proportional to the number of virtual loudspeakers. This means that SBA can render to a high number of virtual loudspeakers without impacting the binauralization computation cost. Furthermore, to improve the spatial perception, SBA binauralization can utilize grids of ideally positioned virtual loudspeakers based on platonic solids or otherwise regularly spaced loudspeaker configurations that are impractical in reality and unsupported in channel-based audio formats. Interactive soundfield rotation in real time is indispensable for creating VR experience. We show how SBA can be rotated and even further enhanced with other user-controlled effects, such as zooming. The paper will discuss use cases to demonstrate the capture, processing, and playback of SBA and will show potential pitfalls and design strategies for an end-to-end spatial audio system for VR. The authors will then conclude that SBA is a robust and compelling audio format for VR, and that SBA can be easily distributed via broadcast or OTT for real-time end consumer use.
引用
收藏
页数:10
相关论文
共 46 条
  • [31] Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)
    Lingkun Zhang
    Xiaochen Wang
    Ruimin Hu
    Dengshi Li
    Weipin Tu
    Multimedia Tools and Applications, 2021, 80 : 2205 - 2220
  • [32] IMAGE AND AUDIO-SPEECH DENOISING BASED ON HIGHER-ORDER STATISTICAL MODELING OF WAVELET COEFFICIENTS AND LOCAL VARIANCE ESTIMATION
    Kittisuwan, Pichid
    Chanwimaluan, Thitiporn
    Marukatat, Sanparith
    Asdornwised, Widhyakorn
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2010, 8 (06) : 987 - 1017
  • [33] StyleWaveGAN: Style-based synthesis of drum sounds using generative adversarial networks for higher audio quality
    Lavault, Antoine
    Roebel, Axel
    Voiry, Matthieu
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 234 - 238
  • [34] END-TO-END AUDIO VISUAL SCENE-AWARE DIALOG USING MULTIMODAL ATTENTION-BASED VIDEO FEATURES
    Hori, Chiori
    Alamri, Huda
    Wang, Jue
    Wichern, Gordon
    Hori, Takaaki
    Cherian, Anoop
    Marks, Tim K.
    Cartillier, Vincent
    Lopes, Raphael Gontijo
    Das, Abhishek
    Essa, Irfan
    Batra, Dhruv
    Parikh, Devi
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2352 - 2356
  • [35] A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter
    Ito, Nobutaka
    Ikeshita, Rintaro
    Sawada, Hiroshi
    Nakatani, Tomohiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1950 - 1965
  • [36] Perceptual evaluation of loudspeaker misplacement compensation in a multichannel setup using MPEG-H 3D Audio renderer. Application to Channel-Based, Scene-Based, and Object-Based audio materials.
    Moulin, Samuel
    Pallone, Gregory
    Faure, Noe
    Bech, Soren
    2019 AES INTERNATIONAL CONFERENCE ON IMMERSIVE AND INTERACTIVE AUDIO, 2019,
  • [37] Noise Robust Urban Audio Classification Based on 2-Order Dense Convolutional Network Using Dual Features
    Cao Y.
    Huang Z.-L.
    Sheng Y.-J.
    Liu C.
    Fei H.-B.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2021, 44 (01): : 86 - 91
  • [38] Immersive audio system based on 2.5D local sound field synthesis using high-speed 1-bit signal
    Kurokawa, Kakeru
    Ikeda, Yusuke
    Osaka, Naotoshi
    Oikawa, Yasuhiro
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2021, 2021, 11766
  • [39] Using Recorded Audio Feedback in Multi-Cultural Higher e-Education: How Do Academics Experience? A Thematic Network Analysis
    Heimburger, Anneli
    Keto, Harri
    Turunen, Jari
    INFORMATION MODELLING AND KNOWLEDGE BASES XXXI, 2020, 321 : 34 - 47
  • [40] On Efficient Content Based Information Retrieval Using SVM and Higher Order Correlation Analysis
    Karras, Dimitrios Alexios
    ADVANCES IN NEURO-INFORMATION PROCESSING, PT II, 2009, 5507 : 112 - 119