TfCleanformer: A streaming, array-agnostic, full- and sub-band modeling front-end for robust ASR

被引:0
|
作者
Heitkaemper, Jens [1 ]
Caroselli, Joe [1 ]
Narayanan, Arun [1 ]
Howard, Nathan [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
来源
关键词
multi-channel; speech enhancement; neural networks;
D O I
10.21437/Interspeech.2024-2378
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiple recent publications have demonstrated the benefits of neural network based enhancement in the time-frequency domain. This paper builds on those findings to improve upon a recently published streaming, array agnostic multi-channel enhancement system called Cleanformer. The proposed streaming enhancement system achieves competitive results against a non-causal state-of-the-art model on a source separation task, outperforming Cleanformer. Additionally, the presented model improves upon Cleanformer enhancement results in multiple challenging environments without introducing further latency. A short ablation study is performed to evaluate the influence of the proposed changes on the improved performance.
引用
收藏
页码:4473 / 4477
页数:5
相关论文
共 5 条
  • [1] TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
    Wang Z.-Q.
    Cornell S.
    Choi S.
    Lee Y.
    Kim B.-Y.
    Watanabe S.
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3221 - 3236
  • [2] A Front-End for Emotional Speech Classification based on New Sub-Band Filters
    Hosseini, Zeinab
    Ahadi, Seyed Mohammad
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 421 - 425
  • [3] A robust front-end processor combining mel frequency cepstral coefficient and sub-band spectral centroid histogram methods for automatic speech recognition
    Department of Information Technology Kongu Engineering College, Perundurai - 638 052, Erode, Tamilnadu State, India
    不详
    Int. J. Signal Process. Image Process. Pattern Recogn., 2008, 2 (67-74):
  • [4] Power-efficient full-duplex K/Ka-band phased array front-end
    Tabarani, Filipe
    Boccia, Luigi
    Calzona, Domenico
    Amendola, Giandomenico
    Schumacher, Hermann
    IET MICROWAVES ANTENNAS & PROPAGATION, 2020, 14 (04) : 268 - 280
  • [5] Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems
    Lu, Xugang
    Unoki, Masashi
    Akagi, Masato
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2008, 29 (06) : 351 - 361