Single-channel speech enhancement using colored spectrograms

被引:0
|
作者
Gul, Sania [1 ,4 ]
Khan, Muhammad Salman [2 ]
Fazeel, Muhammad [3 ]
机构
[1] Univ Engn & Technol, Dept Elect Engn, Peshawar, Pakistan
[2] Qatar Univ, Coll Engn, Dept Elect Engn, Doha, Qatar
[3] Natl Univ Sci & Technol, Sch Mech & Mfg Engn, Islamabad, Pakistan
[4] Natl Ctr Artificial Intelligence, Intelligent Informat Proc Lab, Artificial Intelligence Healthcare, Peshawar, Pakistan
来源
关键词
Colormaps; Pix2pix; Spectrograms; Speech denoising; Deep neural network;
D O I
10.1016/j.csl.2024.101626
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech enhancement concerns the processes required to remove unwanted background sounds from the target speech to improve its quality and intelligibility. In this paper, a novel approach for single -channel speech enhancement is presented using colored spectrograms. We propose the use of a deep neural network (DNN) architecture adapted from the pix2pix generative adversarial network (GAN) and train it over colored spectrograms of speech to denoise them. After denoising, the colors of spectrograms are translated to magnitudes of short -time Fourier transform (STFT) using a shallow regression neural network. These estimated STFT magnitudes are later combined with the noisy phases to obtain an enhanced speech. The results show an improvement of almost 0.84 points in the perceptual evaluation of speech quality (PESQ) and 1 % in the short-term objective intelligibility (STOI) over the unprocessed noisy data. The gain in quality and intelligibility over the unprocessed signal is almost equal to the gain achieved by the baseline methods used for comparison with the proposed model, but at a much reduced computational cost. The proposed solution offers a comparative PESQ score at almost 10 times reduced computational cost than a similar baseline model that has generated the highest PESQ score trained on grayscaled spectrograms, while it provides only a 1 % deficit in STOI at 28 times reduced computational cost when compared to another baseline system based on convolutional neural network-GAN (CNNGAN) that produces the most intelligible speech.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Single-Channel Speech Enhancement Using Double Spectrum
    Blass, Martin
    Mowlaee, Pejman
    Kleijn, W. Bastiaan
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1740 - 1744
  • [2] UltraSE: Single-Channel Speech Enhancement Using Ultrasound
    Sun, Ke
    Zhang, Xinyu
    [J]. PROCEEDINGS OF THE 27TH ACM ANNUAL INTERNATIONAL CONFERENCE ON MOBILE COMPUTING AND NETWORKING (ACM MOBICOM '21), 2021, : 160 - 173
  • [3] Weak Speech Recovery for Single-Channel Speech Enhancement
    Wong, Arthur
    Ming, Kok
    Low, Siow Yong
    [J]. 2012 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT AND ADVANCED SYSTEMS (ICIAS), VOLS 1-2, 2012, : 627 - 631
  • [4] Single-channel Speech Enhancement Using Graph Fourier Transform
    Zhang, Chenhui
    Pan, Xiang
    [J]. INTERSPEECH 2022, 2022, : 946 - 950
  • [5] Single-channel speech enhancement using learnable loss mixup
    Chang, Oscar
    Tran, Dung N.
    Koishida, Kazuhito
    [J]. INTERSPEECH 2021, 2021, : 2696 - 2700
  • [6] Phase Processing for Single-Channel Speech Enhancement
    Gerkmann, Timo
    Krawczyk-Becker, Martin
    Le Roux, Jonathan
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) : 55 - 66
  • [7] Single-Channel Speech Enhancement Techniques for Distant Speech Recognition
    Ashwini, Jaya
    Kumaraswamy, Ramaswamy
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2013, 22 (02) : 81 - 93
  • [8] Single-channel speech enhancement using Kalman filtering in the modulation domain
    So, Stephen
    Wojcicki, Kamil K.
    Paliwal, Kuldip K.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 993 - 996
  • [9] Phase Based Single-Channel Speech Enhancement Using Phase Ratio
    Singh, Sachin
    Mutawa, A. M.
    Gupta, Monika
    Tripathy, Manoj
    Anand, R. S.
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS IN ELECTRICAL ENGINEERING - RECENT ADVANCES (CERA), 2017, : 393 - 396
  • [10] Single-channel Speech Enhancement Student under Multi-channel Speech Enhancement Teacher
    Zhang, Yuzhu
    Zhang, Hui
    Zhang, Xueliang
    [J]. PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 372 - 377