INVERTIBLE DNN-BASED NONLINEAR TIME-FREQUENCY TRANSFORM FOR SPEECH ENHANCEMENT

被引:0
|
作者
Lakeuchi, Daiki [1 ]
Yatabe, Kohei [1 ]
Koizumi, Yuma [2 ]
Oikawa, Yasuhiro [1 ]
Harada, Noboru [2 ]
机构
[1] Waseda Univ, Dept Intermedia Art & Sci, Tokyo, Japan
[2] NTT Media Intelligence Labs, Tokyo, Japan
关键词
Deep neural network (DNN); invertible DNN; i-RevNet; filterbank; lifting scheme;
D O I
10.1109/icassp40776.2020.9053723
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose an end-to-end speech enhancement method with trainable time-frequency (T-F) transform based on invertible deep neural network (DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform (STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect reconstruction is considered. An invertible nonlinear T-F transform is constructed by DNNs and learned from data so that the obtained transform is perfectly reconstructing filterbank.
引用
收藏
页码:6644 / 6648
页数:5
相关论文
共 50 条
  • [1] Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis
    Song, Eunwoo
    Soong, Frank K.
    Kang, Hong-Goo
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2253 - 2257
  • [2] Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation
    Jiang, Wenbin
    Wen, Fei
    Liu, Peilin
    [J]. IEEE ACCESS, 2018, 6 : 52385 - 52392
  • [3] DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH
    Zhao, Yan
    Wang, DeLiang
    Merks, Ivo
    Zhang, Tao
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6525 - 6529
  • [4] DNN-BASED SPEECH ENHANCEMENT USING MBE MODEL
    Huang, Qizheng
    Bao, Changchun
    Wang, Xianyun
    Xiang, Yang
    [J]. 2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 196 - 200
  • [5] DNN-Based Cepstral Excitation Manipulation for Speech Enhancement
    Elshamy, Samy
    Fingscheidt, Tim
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1803 - 1814
  • [6] DNN-Based Speech Enhancement via Integrating NMF and CASA
    Yan, Bofang
    Bao, Changchun
    Bai, Zhigang
    [J]. 2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 435 - 439
  • [7] DNN-Based Linear Prediction Residual Enhancement for Speech Dereverberation
    Feng, Xinyang
    Li, Nuo
    He, Zunwen
    Zhang, Yan
    Zhang, Wancheng
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 541 - 545
  • [8] Boosting DNN-Based Speech Enhancement via Explicit Transformations
    Wang, Qing
    Du, Jun
    Dai, Li-Rong
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [9] DNN-Based Calibrated-Filter Models for Speech Enhancement
    Yazid Attabi
    Benoit Champagne
    Wei-Ping Zhu
    [J]. Circuits, Systems, and Signal Processing, 2021, 40 : 2926 - 2949
  • [10] DNN-BASED AR-WIENER FILTERING FOR SPEECH ENHANCEMENT
    Yang, Yan
    Bao, Changchun
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2901 - 2905