Mask Estimation Using Phase Information and Inter-channel Correlation for Speech Enhancement

被引:0
|
作者
Devi Sowjanya
Shoba Sivapatham
Asutosh Kar
Vladimir Mladenovic
机构
[1] IIITDM Kancheepuram,Department of Electronics and Communication Engineering
[2] VIT University,School of Electronics Engineering
[3] University of Kragujevac,Faculty of Technical Sciences Cacak
关键词
Inter-channel correlation; Phase difference; Deep neural network; Speech enhancement; Ideal ratio mask; Objective metrics;
D O I
暂无
中图分类号
学科分类号
摘要
The most commonly used training target is masking-based approach which maps noisy speech to the time–frequency (T–F) unit and has a remarkable impact on the performance in the supervised learning algorithms. Traditional T–F masks like ideal ratio mask (IRM) demonstrate a strong performance but are limited to only the magnitude domain in enhancement. Though bounded IRM with phase constraint (BIRMP) includes phase difference but doesn’t exploit channel correlation, the proposed ratio mask (pRM) considers channel correlation but is computed only in the magnitude domain. This work proposes a new mask, i.e., phase correlation ideal ratio mask (PCIRM), which includes both inter-channel correlation and phase difference between the noisy speech (NS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_\mathrm{S}$$\end{document}), noise (N) and clean speech (CS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_\mathrm{S}$$\end{document}). Considering these factors increases the percentage of CS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_\mathrm{S}$$\end{document} and readily decreases the percentage of unwanted noise in the speech components and conversely for the noise components making the mask more precise. The experimental results are conducted under different SNR levels using TIMIT dataset and NOISEX-92 dataset and also compared with the existing state-of-the-art approaches. The results prove that the proposed mask has higher performance than BIRMP and pRM in terms of speech quality and intelligibility.
引用
收藏
页码:4117 / 4135
页数:18
相关论文
共 50 条
  • [31] A new approach to evaluation of electroencephalograms inter-channel phase synchronization
    Tolmacheva, Renata A.
    Obukhov, Yury V.
    Zhavoronkova, Ludmila A.
    [J]. 2018 31ST IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS 2018), 2018, : 118 - 122
  • [32] SNR-Based Inter-Component Phase Estimation Using Bi-Phase Prior Statistics for Single-Channel Speech Enhancement
    Barysenka, Siarhei Y. Y.
    Vorobiov, Vasili I. I.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2365 - 2381
  • [33] Localization of Multiple Sound Sources Based on Inter-Channel Correlation Using a Distributed Microphone System
    Cho, Kook
    Okumura, Hajime
    Nishiura, Takanobu
    Yamashita, Yoichi
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 443 - 446
  • [34] Azimuthal and Elevation Localization Using Inter-Channel Phase and Level Differences for a Hemispheric Object
    Chisaki, Yoshifumi
    Takada, Toshimichi
    Naganishi, Masahiro
    Usagawa, Tsuyoshi
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2008, E91A (10): : 3059 - 3062
  • [35] Spectrographic Speech Mask Estimation Using the Time-Frequency Correlation of Speech Presence
    Zhan, Ge
    Huang, Zhaoqiong
    Ying, Dongwen
    Pan, Jielin
    Yan, Yonghong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2287 - 2291
  • [36] Deep demosaicking considering inter-channel correlation and self-similarity
    Iriyama, Taishi
    Sato, Masatoshi
    Aomori, Hisashi
    Otake, Tsuyoshi
    [J]. IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2021, 12 (03): : 453 - 463
  • [37] A Hybrid Polarization Image Demosaicking Algorithm Based on Inter-Channel Correlation
    Lu, Yang
    Tian, Jiandong
    Su, Yiming
    Luo, Yidong
    Zhang, Junchao
    Hao, Chunhui
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2024, 10 : 1400 - 1413
  • [38] Using inter-channel correlation in blind evaluation of noise characteristics in multichannel remote sensing images
    Abramova, Victoriya V.
    Abramov, Sergey K.
    Lukin, Vladimir V.
    Vozel, Benoit
    Chehdi, Kacem
    [J]. IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXIV, 2018, 10789
  • [39] Denoising Algorithm for CFA Image Sensors Considering Inter-Channel Correlation
    Lee, Min Seok
    Park, Sang Wook
    Kang, Moon Gi
    [J]. SENSORS, 2017, 17 (06):
  • [40] Reversible Watermarking on Stereo Audio Signals by Exploring Inter-Channel Correlation
    Wu, Yuanxin
    Diao, Wen
    Hou, Dongdong
    Zhang, Weiming
    [J]. INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2019, 11 (01) : 29 - 45