EXPLORING TRADEOFFS IN MODELS FOR LOW-LATENCY SPEECH ENHANCEMENT

被引:0
|
作者
Wilson, Kevin [1 ]
Chinen, Michael [1 ]
Thorpe, Jeremy [1 ]
Patton, Brian [1 ]
Hershey, John [1 ]
Saurous, Rif A. [1 ]
Skoglund, Jan [1 ]
Lyon, Richard F. [1 ]
机构
[1] Google Res, Google Chrome Audio, Mountain View, CA 94043 USA
关键词
speech enhancement; low-latency inference;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We explore a variety of neural networks configurations for one- and two-channel spectrogram-mask-based speech enhancement. Our best model improves on previous state-of-the-art performance on the CHiME2 speech enhancement task by 0.4 decibels in signal-to-distortion ratio (SDR). We examine trade-offs such as non-causal look-ahead, computation, and parameter count versus enhancement performance and find that zero-look-ahead models can achieve, on average, within 0.03 dB SDR of our best bidirectional model. Further, we find that 200 milliseconds of look-ahead is sufficient to achieve equivalent performance to our best bidirectional model.
引用
收藏
页码:366 / 370
页数:5
相关论文
共 50 条
  • [1] A Survey on Low-Latency DNN-Based Speech Enhancement
    Drgas, Szymon
    [J]. SENSORS, 2023, 23 (03)
  • [2] Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks
    Romaniuk, Michal
    Masztalski, Piotr
    Piaskowski, Karol
    Matuszewski, Mateusz
    [J]. INTERSPEECH 2020, 2020, : 3296 - 3300
  • [3] Low-Latency Neural Speech Translation
    Niehues, Jan
    Ngoc-Quan Pham
    Thanh-Le Ha
    Sperber, Matthias
    Waibel, Alex
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1293 - 1297
  • [4] Low-latency monaural speech enhancement with deep filter-bank equalizer
    Zheng, Chengshi
    Liu, Wenzhe
    Li, Andong
    Ke, Yuxuan
    Li, Xiaodong
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2022, 151 (05): : 3291 - 3304
  • [5] LOW-LATENCY DEEP CLUSTERING FOR SPEECH SEPARATION
    Wang, Shanshan
    Naithani, Gaurav
    Virtanen, Tuomas
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 76 - 80
  • [6] Dynamic Transcription for Low-latency Speech Translation
    Niehues, Jan
    Nguyen, Thai Son
    Cho, Eunah
    Ha, Thanh-Le
    Kilgour, Kevin
    Mueller, Markus
    Sperber, Matthias
    Stueker, Sebastian
    Waibel, Alex
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2513 - 2517
  • [7] Implementation of low-latency electrolaryngeal speech enhancement based on multi-task CLDNN
    Kobayashi, Kazuhiro
    Toda, Tomoki
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 396 - 400
  • [8] Amortized Neural Networks for Low-Latency Speech Recognition
    Macoskey, Jonathan
    Strimel, Grant P.
    Su, Jinru
    Rastrow, Ariya
    [J]. INTERSPEECH 2021, 2021, : 4558 - 4562
  • [9] Design of a robust MVDR beamforming method with Low-Latency by reconstructing covariance matrix for speech enhancement
    Zhou, Jing
    Bao, Changchun
    Zhang, Xu
    Xiong, Wenmeng
    Jia, Maoshen
    [J]. APPLIED ACOUSTICS, 2023, 211
  • [10] Improving Low-Latency Mono-Channel Speech Enhancement by Compensation Windows in STFT Analysis
    Bui, Minh N.
    Tran, Dung N.
    Koishida, Kazuhito
    Tran, Trac D.
    Chin, Peter
    [J]. COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 1, COMPLEX NETWORKS 2023, 2024, 1141 : 363 - 373