The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

被引：2

作者：

Madmoni, Lior ^{[1
]}

Tibor, Shir ^{[2
]}

Nelken, Israel ^{[2
]}

Rafaely, Boaz ^{[1
]}

机构：

[1] Ben Gurion Univ Negev, Sch Elect & Comp Engn, IL-84105 Beer Sheva, Israel

[2] Hebrew Univ Jerusalem, Ctr Brain Sci, IL-9190401 Jerusalem, Israel

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷

基金：

以色列科学基金会;

关键词：

Time-frequency analysis; Reverberation; Speech enhancement; Power measurement; Speech coding; Audio coding; Wavelength measurement; Spatial perception; reverberant speech; direct-to-reverberant ratio; binaural reproduction;

D O I：

10.1109/TASLP.2021.3084742

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The perception of sound in real-life acoustic environments, such as enclosed rooms or open spaces with reflective objects, is affected by reverberation. Hence, reverberation is extensively studied in the context of auditory perception, with many studies highlighting the importance of the direct sound for perception. Based on this insight, speech processing methods often use time-frequency (TF) analysis to detect TF bins that are dominated by the direct sound, and then use the detected bins to reproduce or enhance the speech signals. The detection of bins dominated by the direct sound is typically based on an objective measure, such as the direct-to-reverberant ratio (DRR). However, the relation between the DRR in the TF bins and the spatial perception of the reverberant sound which is reproduced from these bins is still not clear. It is the aim of this paper to provide some insights into this relation, specifically for reverberant speech, focusing on bins with high DRR. This is performed using a listening experiment, where high DRR bins within a reverberant speech signal have been masked in the TF domain, based on various DRR thresholds. The results show that the percentage of high-DRR TF bins that were masked may better indicate the quality of spatial perception, compared to the specific value of the DRR threshold. The insights from this work could be incorporated into spatial audio techniques that reproduce the direct sound of reverberant speech, and potentially improve spatial perception. This was illustrated with an implementation of directional audio coding that was studied with an additional listening experiment supporting the previously described results.

引用

页码：2037 / 2047

页数：11

共 50 条

[1] SPATIAL AND COHERENCE CUES BASED TIME-FREQUENCY MASKING FOR BINAURAL REVERBERANT SPEECH SEPARATION
Alinaghi, Atiyeh
Wang, Wenwu
Jackson, Philip J. B.
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 684 - 688
[2] Reverberant speech separation with probabilistic time-frequency masking for B-format recordings
Chen, Xiaoyi
Wang, Wenwu
Wang, Yingmin
Zhong, Xionghu
Alinaghi, Atiyeh
[J]. SPEECH COMMUNICATION, 2015, 68 : 41 - 54
[3] ACOUSTIC VECTOR SENSOR BASED REVERBERANT SPEECH SEPARATION WITH PROBABILISTIC TIME-FREQUENCY MASKING
Zhong, Xionghu
Chen, Xiaoyi
Wang, Wenwu
Alinaghi, Atiyeh
Premkumar, A. B.
[J]. 2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2013,
[4] On time-frequency masking in voiced speech
Skoglund, J
Kleijn, WB
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (04): : 361 - 369
[5] Maximizing environmental sound recognition and speech intelligibility using time-frequency masking
Johnson, Eric M.
Healy, Eric W.
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
[6] Time-frequency representations in speech perception
Gomez-Vilda, Pedro
Ferrandez-Vicente, Jose M.
Rodellar-Biarge, Victoria
Fernandez-Baillo, Roberto
[J]. NEUROCOMPUTING, 2009, 72 (4-6) : 820 - 830
[7] Stereo-input Speech Recognition using Sparseness-based Time-frequency Masking in a Reverberant Environment
Izumi, Yosuke
Nishiki, Kenta
Watanabe, Shinji
Nishimoto, Takuya
Ono, Nobutaka
Sagayama, Shigeki
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1907 - +
[8] Robust speech separation using time-frequency masking
Aarabi, P
Shi, GJ
Jahromi, O
[J]. 2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 741 - 744
[9] Musical Sound Separation Based on Binary Time-Frequency Masking
Li, Yipeng
Wang, DeLiang
[J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,
[10] Musical Sound Separation Based on Binary Time-Frequency Masking
Yipeng Li
DeLiang Wang
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2009

← 1 2 3 4 5 →