Effect of spectrogram resolution on deep-neural-network-based speech enhancement

被引:6
|
作者
Takeuchi, Daiki [1 ]
Yatabe, Kohei [1 ]
Koizumi, Yuma [2 ]
Oikawa, Yasuhiro [1 ]
Harada, Noboru [2 ]
机构
[1] Waseda Univ, Dept Intermedia Art & Sci, Shinjuku Ku, 3-4-1 Okubo, Tokyo 1698555, Japan
[2] NTT Media Intelligence Labs, Tokyo, Japan
关键词
Speech enhancement; Deep learning; Time-frequency transform; Redundancy; Experimental investigation; OPTIMIZATION; SEPARATION;
D O I
10.1250/ast.41.769
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In recent single-channel speech enhancement, deep neural network (DNN) has played a quite important role for achieving high performance. One standard use of DNN is to construct a mask-generating function for time-frequency (T-F) masking. For applying a mask in T-F domain, the shorttime Fourier transform (STFT) is usually utilized because of its well-understood and invertible nature. While the mask-generating regression function has been studied for a long time, there is less research on T-F transform from the viewpoint of speech enhancement. Since the performance of speech enhancement depends on both the T-F mask estimator and T-F transform, investigating T-F transform should be beneficial for designing a better enhancement system. In this paper, as a step toward optimal T-F transform in terms of speech enhancement, we experimentally investigated the effect of parameter settings of STFT on a DNN-based mask estimator. We conducted the experiments using three types of DNN architectures with three types of loss functions, and the results suggested that U-Net is robust to the parameter setting while that is not the case for fully connected and BLSTM networks.
引用
收藏
页码:769 / 775
页数:7
相关论文
共 50 条
  • [1] Speech Enhancement using Convolution Neural Network-based Spectrogram Denoising
    Hu Xuhong
    Yan Lin-Huang
    Lu Xun
    Guan Yuan-Sheng
    Hu Wenlin
    Wang Jie
    [J]. PROCEEDINGS OF 2021 7TH INTERNATIONAL CONFERENCE ON CONDITION MONITORING OF MACHINERY IN NON-STATIONARY OPERATIONS (CMMNO), 2021, : 310 - 318
  • [2] Deep Neural Network Based Complex Spectrogram Reconstruction for Speech Bandwidth Expansion
    Yu, Hongjiang
    Zhu, Wei-Ping
    [J]. 2020 18TH IEEE INTERNATIONAL NEW CIRCUITS AND SYSTEMS CONFERENCE (NEWCAS'20), 2020, : 110 - 113
  • [3] Speech Enhancement based on Deep Convolutional Neural Network
    Nuthakki, Ramesh
    Masanta, Payel
    Yukta, T. N.
    [J]. PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 770 - 775
  • [4] Supervised speech enhancement based on deep neural network
    Saleem, Nasir
    Khattak, Muhammad Irfan
    Qazi, Abdul Baser
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 5187 - 5201
  • [5] A FULLY CONVOLUTIONAL NEURAL NETWORK FOR COMPLEX SPECTROGRAM PROCESSING IN SPEECH ENHANCEMENT
    Ouyangi, Zhiheng
    Yu, Hongjiang
    Zhu, Wei-Ping
    Champagne, Benoit
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5756 - 5760
  • [6] An optimization method for speech enhancement based on deep neural network
    Sun, Haixia
    Li, Sikun
    [J]. 3RD INTERNATIONAL CONFERENCE ON ADVANCES IN ENERGY, ENVIRONMENT AND CHEMICAL ENGINEERING, 2017, 69
  • [7] Speech enhancement based on noise classification and deep neural network
    Wang, Wenbo
    Liu, Houguang
    Yang, Jianhua
    Cao, Guohua
    Hua, Chunli
    [J]. MODERN PHYSICS LETTERS B, 2019, 33 (17):
  • [8] Diffractive Deep-Neural-Network-Based Classifier for Holographic Memory
    Sakurai, Toshihiro
    Ito, Tomoyoshi
    Shimobaba, Tomoyoshi
    [J]. PHOTONICS, 2024, 11 (02)
  • [9] Investigating Modulation Spectrogram Features for Deep Neural Network-based Automatic Speech Recognition
    Baby, Deepak
    Van Hamme, Hugo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2479 - 2483
  • [10] EXEMPLAR-BASED SPEECH ENHANCEMENT FOR DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION
    Baby, Deepak
    Gemmeke, Jort F.
    Virtanen, Tuomas
    Van hamme, Hugo
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4485 - 4489