MusicHiFi: Fast High-Fidelity Stereo Vocoding

被引:0
|
作者
Zhu, Ge [1 ,2 ]
Caceres, Juan-Pablo [2 ]
Duan, Zhiyao [1 ]
Bryan, Nicholas J. [2 ]
机构
[1] Univ Rochester, Rochester, NY 14627 USA
[2] Adobe Res, San Jose, CA 95110 USA
基金
美国国家科学基金会;
关键词
Vocoders; Training; Generators; Bandwidth; Frequency modulation; Convolution; Computer architecture; Music generation; mel-spectrogram inversion; bandwidth extension; mono-to-stereo upmixing;
D O I
10.1109/LSP.2024.3432393
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Diffusion-based audio and music generation models commonly perform generation by constructing an image representation of audio (e.g., a mel-spectrogram) and then convert it to waveform using a phase reconstruction model or vocoder. Typical vocoders, however, produce monophonic audio at lower resolutions (e.g., 16-24 kHz), which limits their usefulness. We propose MusicHiFi-an efficient high-fidelity stereophonic vocoder. Our method employs a cascade of three generative adversarial networks (GANs) that convert low-resolution mel-spectrograms to audio, upsamples to high-resolution audio via bandwidth extension, and upmixes to stereophonic audio. Compared to past work, we propose 1) a unified GAN-based generator and discriminator architecture and training procedure for each stage of our cascade, 2) a new fast, near downsampling-compatible bandwidth extension module, and 3) a new fast downmix-compatible mono-to-stereo upmixer that ensures the preservation of monophonic content in the output. We evaluate our approach using objective and subjective listening tests and find our approach yields comparable or better audio quality, better spatialization control, and significantly faster inference speed compared to past work.
引用
收藏
页码:2365 / 2369
页数:5
相关论文
共 50 条
  • [41] High-fidelity quantum driving
    Bason, Mark G.
    Viteau, Matthieu
    Malossi, Nicola
    Huillery, Paul
    Arimondo, Ennio
    Ciampini, Donatella
    Fazio, Rosario
    Giovannetti, Vittorio
    Mannella, Riccardo
    Morsch, Oliver
    NATURE PHYSICS, 2012, 8 (02) : 147 - 152
  • [42] HIGH-FIDELITY KNOWLEDGE SYSTEMS
    SOH, CK
    SOH, AK
    LAI, KY
    ADVANCES IN ENGINEERING SOFTWARE, 1993, 18 (01) : 15 - 29
  • [43] High-fidelity quantum driving
    Bason M.G.
    Viteau M.
    Malossi N.
    Huillery P.
    Arimondo E.
    Ciampini D.
    Fazio R.
    Giovannetti V.
    Mannella R.
    Morsch O.
    Nature Physics, 2012, 8 (2) : 147 - 152
  • [44] Clearing the road for high-fidelity fast ion simulations in full three dimensions
    Kurki-Suonio, T.
    Sarkimaki, K.
    Varje, J.
    Aekaeslompolo, S.
    Kontula, J.
    Ollus, P.
    Becoulet, M.
    Chone, L.
    Liu, Y.
    Vincenzi, P.
    JOURNAL OF PLASMA PHYSICS, 2018, 84 (06)
  • [45] BeamDyn: a high-fidelity wind turbine blade solver in the FAST modular framework
    Wang, Qi
    Sprague, Michael A.
    Jonkman, Jason
    Johnson, Nick
    Jonkman, Bonnie
    WIND ENERGY, 2017, 20 (08) : 1439 - 1462
  • [46] Fast and High-Fidelity Entangling Gate through Parametrically Modulated Longitudinal Coupling
    Royer, Baptiste
    Grimsmo, Arne L.
    Didier, Nicolas
    Blais, Alexandre
    QUANTUM, 2017, 1
  • [47] CrashTest: A Fast High-Fidelity FPGA-Based Resiliency Analysis Framework
    Pellegrini, Andrea
    Constantinides, Kypros
    Zhang, Dan
    Sudhakar, Shobana
    Bertacco, Valeria
    Austin, Todd
    2008 IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, 2008, : 363 - 370
  • [48] A Fast High-Fidelity Source-Filter Vocoder With Lightweight Neural Modules
    Yang, Runxuan
    Peng, Yuyang
    Hu, Xiaolin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3362 - 3373
  • [49] Fast high-fidelity multiqubit state transfer with long-range interactions
    Hong, Yifan
    Lucas, Andrew
    PHYSICAL REVIEW A, 2021, 103 (04)
  • [50] Fast, Transparent, and High-Fidelity Memoization Cache-Keys for Computational Workflows
    Vassiliadis, Vassilis
    Johnston, Michael A.
    McDonagh, James L.
    2022 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2022), 2022, : 174 - 184