Mask Estimation Using Phase Information and Inter-channel Correlation for Speech Enhancement

被引:0
|
作者
Devi Sowjanya
Shoba Sivapatham
Asutosh Kar
Vladimir Mladenovic
机构
[1] IIITDM Kancheepuram,Department of Electronics and Communication Engineering
[2] VIT University,School of Electronics Engineering
[3] University of Kragujevac,Faculty of Technical Sciences Cacak
关键词
Inter-channel correlation; Phase difference; Deep neural network; Speech enhancement; Ideal ratio mask; Objective metrics;
D O I
暂无
中图分类号
学科分类号
摘要
The most commonly used training target is masking-based approach which maps noisy speech to the time–frequency (T–F) unit and has a remarkable impact on the performance in the supervised learning algorithms. Traditional T–F masks like ideal ratio mask (IRM) demonstrate a strong performance but are limited to only the magnitude domain in enhancement. Though bounded IRM with phase constraint (BIRMP) includes phase difference but doesn’t exploit channel correlation, the proposed ratio mask (pRM) considers channel correlation but is computed only in the magnitude domain. This work proposes a new mask, i.e., phase correlation ideal ratio mask (PCIRM), which includes both inter-channel correlation and phase difference between the noisy speech (NS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_\mathrm{S}$$\end{document}), noise (N) and clean speech (CS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_\mathrm{S}$$\end{document}). Considering these factors increases the percentage of CS\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_\mathrm{S}$$\end{document} and readily decreases the percentage of unwanted noise in the speech components and conversely for the noise components making the mask more precise. The experimental results are conducted under different SNR levels using TIMIT dataset and NOISEX-92 dataset and also compared with the existing state-of-the-art approaches. The results prove that the proposed mask has higher performance than BIRMP and pRM in terms of speech quality and intelligibility.
引用
收藏
页码:4117 / 4135
页数:18
相关论文
共 50 条
  • [1] Mask Estimation Using Phase Information and Inter-channel Correlation for Speech Enhancement
    Sowjanya, Devi
    Sivapatham, Shoba
    Kar, Asutosh
    Mladenovic, Vladimir
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 41 (07) : 4117 - 4135
  • [2] A Single Image Enhancement using Inter-channel Correlation
    Kim, Jin
    Jeong, Soowoong
    Kim, Yong-Ho
    Lee, Sangkeun
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2012, : 496 - 497
  • [3] Multiple Speech Source Separation Using Inter-Channel Correlation and Relaxed Sparsity
    Jia, Maoshen
    Sun, Jundai
    Zheng, Xiguang
    [J]. APPLIED SCIENCES-BASEL, 2018, 8 (01):
  • [4] Mask estimation incorporating phase-sensitive information for speech enhancement
    Wang, Xianyun
    Bao, Changchun
    [J]. APPLIED ACOUSTICS, 2019, 156 : 101 - 112
  • [5] Demosaicking using inter-channel correlation in wavelet domain
    Kim, Hyuk Su
    Jeong, Bo Gyu
    Kim, Sang Soo
    Eom, Il Kyu
    [J]. PROCEEDINGS OF THE NINTH IASTED INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, 2007, : 109 - 114
  • [6] Target Speech Detection Based on Microphone Array Using Inter-channel Phase Differences
    Guo, Yanmeng
    Li, Kai
    Fu, Qiang
    Yan, Yonghong
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2012, : 247 - 248
  • [7] Estimation of Inter-Channel Phase Differences using Non-Negative Matrix Factorization
    Kayser, Hendrik
    Anemueller, Joern
    Adiloglu, Kamil
    [J]. 2014 IEEE 8TH SENSOR ARRAY AND MULTICHANNEL SIGNAL PROCESSING WORKSHOP (SAM), 2014, : 77 - 80
  • [8] Harmonic Phase Estimation in Single-Channel Speech Enhancement Using Phase Decomposition and SNR Information
    Mowlaee, Pejman
    Kulmer, Josef
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) : 1521 - 1532
  • [9] Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition
    Kulmer, Josef
    Mowlaee, Pejman
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2015, 22 (05) : 598 - 602
  • [10] A Demosaicking Algorithm with Adaptive Inter-Channel Correlation
    Duran, Joan
    Buades, Antoni
    [J]. IMAGE PROCESSING ON LINE, 2015, 5 : 311 - 327