The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

被引:2
|
作者
Madmoni, Lior [1 ]
Tibor, Shir [2 ]
Nelken, Israel [2 ]
Rafaely, Boaz [1 ]
机构
[1] Ben Gurion Univ Negev, Sch Elect & Comp Engn, IL-84105 Beer Sheva, Israel
[2] Hebrew Univ Jerusalem, Ctr Brain Sci, IL-9190401 Jerusalem, Israel
基金
以色列科学基金会;
关键词
Time-frequency analysis; Reverberation; Speech enhancement; Power measurement; Speech coding; Audio coding; Wavelength measurement; Spatial perception; reverberant speech; direct-to-reverberant ratio; binaural reproduction;
D O I
10.1109/TASLP.2021.3084742
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The perception of sound in real-life acoustic environments, such as enclosed rooms or open spaces with reflective objects, is affected by reverberation. Hence, reverberation is extensively studied in the context of auditory perception, with many studies highlighting the importance of the direct sound for perception. Based on this insight, speech processing methods often use time-frequency (TF) analysis to detect TF bins that are dominated by the direct sound, and then use the detected bins to reproduce or enhance the speech signals. The detection of bins dominated by the direct sound is typically based on an objective measure, such as the direct-to-reverberant ratio (DRR). However, the relation between the DRR in the TF bins and the spatial perception of the reverberant sound which is reproduced from these bins is still not clear. It is the aim of this paper to provide some insights into this relation, specifically for reverberant speech, focusing on bins with high DRR. This is performed using a listening experiment, where high DRR bins within a reverberant speech signal have been masked in the TF domain, based on various DRR thresholds. The results show that the percentage of high-DRR TF bins that were masked may better indicate the quality of spatial perception, compared to the specific value of the DRR threshold. The insights from this work could be incorporated into spatial audio techniques that reproduce the direct sound of reverberant speech, and potentially improve spatial perception. This was illustrated with an implementation of directional audio coding that was studied with an additional listening experiment supporting the previously described results.
引用
收藏
页码:2037 / 2047
页数:11
相关论文
共 50 条
  • [41] TIME-FREQUENCY MASKING-BASED SPEECH ENHANCEMENT USING GENERATIVE ADVERSARIAL NETWORK
    Soni, Meet H.
    Shah, Neil
    Patil, Hemant A.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5039 - 5043
  • [42] Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
    Luo, Yi
    Mesgarani, Nima
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (08) : 1256 - 1266
  • [43] Robust Automatic Speech Recognition System Based on Using Adaptive Time-Frequency Masking
    Gouda, Ahmed Mostafa
    Tamazin, Mohamed
    Khedr, Mohamed
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 181 - 186
  • [44] On Using Time-Frequency Binary Masking For Dereverberation
    Mischie, Septimiu
    [J]. 2013 INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS (ISSCS), 2013,
  • [45] Musical sound separation using pitch-based labeling and binary time-frequency masking
    Li, Yipeng
    Wang, DeLiang
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 173 - +
  • [46] Constructing Time-Frequency Dictionaries for Source Separation via Time-Frequency Masking and Source Localisation
    de Frein, Ruairi
    Rickard, Scott T.
    Pearlmutter, Barak A.
    [J]. INDEPENDENT COMPONENT ANALYSIS AND SIGNAL SEPARATION, PROCEEDINGS, 2009, 5441 : 573 - +
  • [47] PHASE TIME-FREQUENCY MASKING BASED SPEECH ENHANCEMENT ALGORITHM USING CIRCULAR MICROPHONE ARRAY
    He, Li
    Zhou, Yi
    Liu, Hongqing
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 808 - 813
  • [48] Impact of phase estimation on single-channel speech separation based on time-frequency masking
    Mayer, Florian
    Williamson, Donald S.
    Mowlaee, Pejman
    Wang, DeLiang
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (06): : 4668 - 4679
  • [49] On-line Speech Enhancement by Time-Frequency Masking under Prior Knowledge of Source Location
    Kang, Min Ah
    Jeong, Sangbae
    Hahn, Minsoo
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 25, 2007, 25 : 116 - 121
  • [50] Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking
    Pertila, P.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 683 - 702