Estimating Ensemble Location and Width in Binaural Recordings of Music with Convolutional Neural Networks

被引:0
|
作者
Antoniuk, Pawel [1 ]
Zielinski, Slawomir K. [1 ]
机构
[1] Bialystok Tech Univ, Fac Comp Sci, Bialystok, Poland
关键词
ensemble width; ensemble location; binaural; spatial audio; localization; convolutional neural net- work; head-related transfer function; angle of arrival; SPATIAL AUDIO; SOUND SOURCE; ROBUST LOCALIZATION; HEAD MOVEMENTS; MODEL; SPEAKERS; DATABASE; FRONT;
D O I
10.24425/aoa.2025.153648
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Binaural audio technology has been in existence for many years. However, its popularity has significantly increased over the past decade as a consequence of advancements in virtual reality and streaming techniques. Along with its growing popularity, the quantity of publicly accessible binaural audio recordings has also expanded. Consequently, there is now a need for automated and objective retrieval of spatial content information, with ensemble location and width being the most prominent. This study presents a novel method for estimating these ensemble parameters in binaural recordings of music. For this purpose, a dataset of 23 040 binaural recordings was synthesized from 192 publicly-available music recordings using 30 head-related transfer functions. The synthesized excerpts were then used to train a multi-task spectrogram-based convolutional neural network model, aiming to estimate the ensemble location and width for unseen recordings. The results indicate that a model for estimating ensemble parameters can be successfully constructed with low prediction errors: 4.76 circle (+/- 0.10 circle) for ensemble location and 8.57 circle (+/- 0.19 circle) for ensemble width. The method developed in this study outperforms previous spatiogram-based techniques recently published in the literature and shows promise for future development as part of a novel tool for binaural audio recordings analysis.
引用
收藏
页码:81 / 93
页数:13
相关论文
共 50 条
  • [1] A Pseudo Ensemble Convolutional Neural Networks
    Jang, Jaeyoon
    Cho, Youngjo
    Yoon, Hosub
    2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI), 2016, : 901 - 902
  • [2] Ensemble width estimation in HRTF-convolved binaural music recordings using an auditory model and a gradient-boosted decision trees regressor
    Antoniuk, Pawel
    Zielinski, Slawomir K.
    Lee, Hyunkook
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2024, 2024 (01):
  • [3] Ensemble convolutional neural networks for pose estimation
    Kawana, Yuki
    Ukita, Norimichi
    Huang, Jia-Bin
    Yang, Ming-Hsuan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2018, 169 : 62 - 74
  • [4] Ensemble Convolutional Neural Networks for Face Recognition
    Cheng, Wen-Chang
    Wu, Tin-Yu
    Li, Dai-Wei
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [5] An Ensemble of Convolutional Neural Networks for Audio Classification
    Nanni, Loris
    Maguolo, Gianluca
    Brahnam, Sheryl
    Paci, Michelangelo
    APPLIED SCIENCES-BASEL, 2021, 11 (13):
  • [6] Ensemble of Convolutional Neural Networks for Face Recognition
    Mohanraj, V.
    Chakkaravarthy, S. Sibi
    Vaidehi, V.
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 467 - 477
  • [7] Reliable Classification with Ensemble Convolutional Neural Networks
    Gao, Zhen
    Zhang, Han
    Wei, Xiaohui
    Yan, Tong
    Guo, Kangkang
    Li, Wenshuo
    Wang, Yu
    Reviriego, Pedro
    2020 33RD IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFT), 2020,
  • [8] Ensemble of convolutional neural networks for bioimage classification
    Nanni, Loris
    Ghidon, Stefano
    Brahnam, Sheryl
    APPLIED COMPUTING AND INFORMATICS, 2021, 17 (01) : 19 - 35
  • [9] CONVOLUTIONAL RECURRENT NEURAL NETWORKS FOR MUSIC CLASSIFICATION
    Choi, Keunwoo
    Fazekas, Gyorgy
    Sandler, Mark
    Cho, Kyunghyun
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2392 - 2396
  • [10] Automatic discrimination between front and back ensemble locations in HRTF-convolved binaural recordings of music
    Sławomir K. Zieliński
    Paweł Antoniuk
    Hyunkook Lee
    Dale Johnson
    EURASIP Journal on Audio, Speech, and Music Processing, 2022