Efficient, compelling and immersive VR audio experience using Scene Based Audio/Higher Order Ambisonics

被引：0

作者：

Shivappa, Shankar ^{[1
]}

Morrell, Martin ^{[1
]}

Sen, Deep ^{[1
]}

Peters, Nils ^{[1
]}

Salehin, S. M. Akramus ^{[1
]}

机构：

[1] Qualcomm Technol Inc QTI, San Diego, CA 92121 USA

来源：

2016 AES INTERNATIONAL CONFERENCE ON AUDIO FOR VIRTUAL AND AUGMENTED REALITY | 2016年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

For a fully immersive and compelling VR experience, the acoustic-illusion of being 'present' in the virtual world must be created. To achieve this illusion two aspects are compulsory: (1) authentic spatial audio production and (2) the need to track and adapt the audio scene to the listener's head position and orientation. This paper shows how Scene-based audio (SBA), often synonymous with Higher Order Ambisonics (HOA), is ideal for VR because its ease of acoustic capture, offline content creation, post-production, transmission and interactive rendering. Compared to object-based audio, the rendering complexity is much lower for SBA. Also, SBA can offer higher and more coherent spatial fidelity when compared to channel based audio. One of the advantages of SBA is flexible rendering, which means that the same audio stream can be rendered to various speaker formats including binaural rendering for headphone consumption. The paper discusses the need for efficient SBA compression for VR content delivery, and presents MPEG-H as an efficient and versatile delivery system for SBA. For a personalized VR experience, accurate binaural rendering is essential. SBA can be efficiently binauralized. Its number of convolutions is proportional to the number of HOA coefficients, rather than proportional to the number of virtual loudspeakers. This means that SBA can render to a high number of virtual loudspeakers without impacting the binauralization computation cost. Furthermore, to improve the spatial perception, SBA binauralization can utilize grids of ideally positioned virtual loudspeakers based on platonic solids or otherwise regularly spaced loudspeaker configurations that are impractical in reality and unsupported in channel-based audio formats. Interactive soundfield rotation in real time is indispensable for creating VR experience. We show how SBA can be rotated and even further enhanced with other user-controlled effects, such as zooming. The paper will discuss use cases to demonstrate the capture, processing, and playback of SBA and will show potential pitfalls and design strategies for an end-to-end spatial audio system for VR. The authors will then conclude that SBA is a robust and compelling audio format for VR, and that SBA can be easily distributed via broadcast or OTT for real-time end consumer use.

引用

页数：10

共 46 条

[31] Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)
Lingkun Zhang
Xiaochen Wang
Ruimin Hu
Dengshi Li
Weipin Tu
Multimedia Tools and Applications, 2021, 80 : 2205 - 2220
[32] IMAGE AND AUDIO-SPEECH DENOISING BASED ON HIGHER-ORDER STATISTICAL MODELING OF WAVELET COEFFICIENTS AND LOCAL VARIANCE ESTIMATION
Kittisuwan, Pichid
Chanwimaluan, Thitiporn
Marukatat, Sanparith
Asdornwised, Widhyakorn
INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2010, 8 (06) : 987 - 1017
[33] StyleWaveGAN: Style-based synthesis of drum sounds using generative adversarial networks for higher audio quality
Lavault, Antoine
Roebel, Axel
Voiry, Matthieu
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 234 - 238
[34] END-TO-END AUDIO VISUAL SCENE-AWARE DIALOG USING MULTIMODAL ATTENTION-BASED VIDEO FEATURES
Hori, Chiori
Alamri, Huda
Wang, Jue
Wichern, Gordon
Hori, Takaaki
Cherian, Anoop
Marks, Tim K.
Cartillier, Vincent
Lopes, Raphael Gontijo
Das, Abhishek
Essa, Irfan
Batra, Dhruv
Parikh, Devi
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2352 - 2356
[35] A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter
Ito, Nobutaka
Ikeshita, Rintaro
Sawada, Hiroshi
Nakatani, Tomohiro
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1950 - 1965
[36] Perceptual evaluation of loudspeaker misplacement compensation in a multichannel setup using MPEG-H 3D Audio renderer. Application to Channel-Based, Scene-Based, and Object-Based audio materials.
Moulin, Samuel
Pallone, Gregory
Faure, Noe
Bech, Soren
2019 AES INTERNATIONAL CONFERENCE ON IMMERSIVE AND INTERACTIVE AUDIO, 2019,
[37] Noise Robust Urban Audio Classification Based on 2-Order Dense Convolutional Network Using Dual Features
Cao Y.
Huang Z.-L.
Sheng Y.-J.
Liu C.
Fei H.-B.
Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2021, 44 (01): : 86 - 91
[38] Immersive audio system based on 2.5D local sound field synthesis using high-speed 1-bit signal
Kurokawa, Kakeru
Ikeda, Yusuke
Osaka, Naotoshi
Oikawa, Yasuhiro
INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2021, 2021, 11766
[39] Using Recorded Audio Feedback in Multi-Cultural Higher e-Education: How Do Academics Experience? A Thematic Network Analysis
Heimburger, Anneli
Keto, Harri
Turunen, Jari
INFORMATION MODELLING AND KNOWLEDGE BASES XXXI, 2020, 321 : 34 - 47
[40] On Efficient Content Based Information Retrieval Using SVM and Higher Order Correlation Analysis
Karras, Dimitrios Alexios
ADVANCES IN NEURO-INFORMATION PROCESSING, PT II, 2009, 5507 : 112 - 119

← 1 2 3 4 5 →