FILTERED NOISE SHAPING FOR TIME DOMAIN ROOM IMPULSE RESPONSE ESTIMATION FROM REVERBERANT SPEECH

被引:16
|
作者
Steinmetz, Christian J. [1 ,2 ]
Ithapu, Vamsi Krishna [2 ]
Calamia, Paul [2 ]
机构
[1] Queen Mary Univ London, Ctr Digital Mus, London, England
[2] Facebook Real Labs Res, Redmond, WA USA
关键词
Room impulse response; acoustic matching; reverberation; synthesis; blind estimation;
D O I
10.1109/WASPAA52581.2021.9632680
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep learning approaches have emerged that aim to transform an audio signal so that it sounds as if it was recorded in the same room as a reference recording, with applications both in audio post-production and augmented reality. In this work, we propose FiNS, a Filtered Noise Shaping network that directly estimates the time domain room impulse response (RIR) from reverberant speech. Our domain-inspired architecture features a time domain encoder and a filtered noise shaping decoder that models the RIR as a summation of decaying filtered noise signals, along with direct sound and early reflection components. Previous methods for acoustic matching utilize either large models to transform audio to match the target room or predict parameters for algorithmic reverberators. Instead, blind estimation of the RIR enables efficient and realistic transformation with a single convolution. An evaluation demonstrates our model not only synthesizes RIRs that match parameters of the target room, such as the T-60 and DRR, but also more accurately reproduces perceptual characteristics of the target room, as shown in a listening test when compared to deep learning baselines.
引用
收藏
页码:221 / 225
页数:5
相关论文
共 50 条
  • [1] Blind Estimation of Room Impulse Response from Monaural Reverberant Speech with Segmental Generative Neural Network
    Liao, Zhiheng
    Xiong, Feifei
    Luo, Juan
    Cai, Minjie
    Chng, Eng Siong
    Feng, Jinwei
    Zhong, Xionghu
    INTERSPEECH 2023, 2023, : 2723 - 2727
  • [2] IMPULSE RESPONSE ESTIMATION FOR ROBUST SPEECH RECOGNITION IN A REVERBERANT ENVIRONMENT
    Ravanelli, Mirco
    Sosi, Alessandro
    Svaizer, Piergiorgio
    Omologo, Maurizio
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1668 - 1672
  • [3] Reverberation time estimation from speech signals based on blind room impulse response identification (L)
    Wu, Lifu
    Qiu, Xiaojun
    Burnett, Ian
    Guo, Yecai
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 138 (02): : 731 - 734
  • [4] Reverberation time estimation from speech signals based on blind room impulse response identification (L)
    20153301179948
    1600, Acoustical Society of America (138):
  • [5] IMPROVING REVERBERANT SPEECH SEPARATION WITH SYNTHETIC ROOM IMPULSE RESPONSES
    Aralikatti, Rohith
    Ratnarajah, Anton
    Tang, Zhenyu
    Manocha, Dinesh
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 900 - 906
  • [6] A Frequency Domain Method for Speech Separation in a Reverberant Room
    Mischie, Septimiu
    Simion, Georgiana
    2010 9TH INTERNATIONAL SYMPOSIUM ON ELECTRONICS AND TELECOMMUNICATIONS (ISETC), 2010, : 303 - 306
  • [7] Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model
    Liu, Jindong
    Yang, Guang-Zhong
    SPEECH COMMUNICATION, 2015, 67 : 65 - 77
  • [8] RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification
    Bittermani, Jacob
    Levi, Daniel
    Diamandi, Hilel Hagai
    Gannot, Sharon
    Rosenweini, Tal
    INTERSPEECH 2024, 2024, : 3280 - 3284
  • [9] ESTIMATION OF ROOM DIMENSIONS FROM A SINGLE IMPULSE RESPONSE
    Markovic, Dejan
    Antonacci, Fabio
    Sarti, Augusto
    Tubaro, Stefano
    2013 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2013,
  • [10] Estimation of speech embedded in a reverberant environment with multiple sources of noise
    Barros, AK
    Itakura, F
    Rutkowski, T
    Mansour, A
    Ohnishi, N
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 629 - 632