Single-channel speech enhancement using colored spectrograms

被引:0
|
作者
Gul, Sania [1 ,4 ]
Khan, Muhammad Salman [2 ]
Fazeel, Muhammad [3 ]
机构
[1] Univ Engn & Technol, Dept Elect Engn, Peshawar, Pakistan
[2] Qatar Univ, Coll Engn, Dept Elect Engn, Doha, Qatar
[3] Natl Univ Sci & Technol, Sch Mech & Mfg Engn, Islamabad, Pakistan
[4] Natl Ctr Artificial Intelligence, Intelligent Informat Proc Lab, Artificial Intelligence Healthcare, Peshawar, Pakistan
来源
关键词
Colormaps; Pix2pix; Spectrograms; Speech denoising; Deep neural network;
D O I
10.1016/j.csl.2024.101626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech enhancement concerns the processes required to remove unwanted background sounds from the target speech to improve its quality and intelligibility. In this paper, a novel approach for single -channel speech enhancement is presented using colored spectrograms. We propose the use of a deep neural network (DNN) architecture adapted from the pix2pix generative adversarial network (GAN) and train it over colored spectrograms of speech to denoise them. After denoising, the colors of spectrograms are translated to magnitudes of short -time Fourier transform (STFT) using a shallow regression neural network. These estimated STFT magnitudes are later combined with the noisy phases to obtain an enhanced speech. The results show an improvement of almost 0.84 points in the perceptual evaluation of speech quality (PESQ) and 1 % in the short-term objective intelligibility (STOI) over the unprocessed noisy data. The gain in quality and intelligibility over the unprocessed signal is almost equal to the gain achieved by the baseline methods used for comparison with the proposed model, but at a much reduced computational cost. The proposed solution offers a comparative PESQ score at almost 10 times reduced computational cost than a similar baseline model that has generated the highest PESQ score trained on grayscaled spectrograms, while it provides only a 1 % deficit in STOI at 28 times reduced computational cost when compared to another baseline system based on convolutional neural network-GAN (CNNGAN) that produces the most intelligible speech.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Single-channel multiple regression for in-car speech enhancement
    Li, WF
    Itou, K
    Takeda, K
    Itakura, F
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03) : 1032 - 1039
  • [22] Combine Waveform and Spectral Methods for Single-channel Speech Enhancement
    Li, Miao
    Zhang, Hui
    Zhang, Xueliang
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 47 - 52
  • [23] A two-stage method for single-channel speech enhancement
    Hamid, ME
    Fukabayashi, T
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2006, E89A (04) : 1058 - 1068
  • [24] Single-channel speech enhancement based on frequency domain ALE
    Nakanishi, Isao
    Nagata, Yuudai
    Itoh, Yoshio
    Fukui, Yutaka
    [J]. 2006 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, PROCEEDINGS, 2006, : 2541 - 2544
  • [25] Deep Learning Models for Single-Channel Speech Enhancement on Drones
    Mukhutdinov, Dmitrii
    Alex, Ashish
    Cavallaro, Andrea
    Wang, Lin
    [J]. IEEE ACCESS, 2023, 11 : 22993 - 23007
  • [26] Deep Neural Network for Supervised Single-Channel Speech Enhancement
    Saleem, Nasir
    Irfan Khattak, Muhammad
    Ali, Muhammad Yousaf
    Shafi, Muhammad
    [J]. ARCHIVES OF ACOUSTICS, 2019, 44 (01) : 3 - 12
  • [27] SPEAKER AND NOISE INDEPENDENT ONLINE SINGLE-CHANNEL SPEECH ENHANCEMENT
    Germain, Francois G.
    Mysore, Gautham J.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 71 - 75
  • [28] INVESTIGATION OF A PARAMETRIC GAIN APPROACH TO SINGLE-CHANNEL SPEECH ENHANCEMENT
    Huang, Gongping
    Chen, Jingdong
    Benesty, Jacob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 206 - 210
  • [29] ON PHASE IMPORTANCE IN PARAMETER ESTIMATION IN SINGLE-CHANNEL SPEECH ENHANCEMENT
    Mowlaee, Pejman
    Saeidi, Rahim
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7462 - 7466
  • [30] Modified Amplitude Spectral Estimator for Single-Channel Speech Enhancement
    Zhai, Zhenhui
    Ou, Shifeng
    Gao, Ying
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS (AMEII 2016), 2016, 73 : 1115 - 1120