TfCleanformer: A streaming, array-agnostic, full- and sub-band modeling front-end for robust ASR

被引：0

作者：

Heitkaemper, Jens ^{[1
]}

Caroselli, Joe ^{[1
]}

Narayanan, Arun ^{[1
]}

Howard, Nathan ^{[1
]}

机构：

[1] Google LLC, Mountain View, CA 94043 USA

来源：

INTERSPEECH 2024 | 2024年

关键词：

multi-channel; speech enhancement; neural networks;

D O I：

10.21437/Interspeech.2024-2378

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiple recent publications have demonstrated the benefits of neural network based enhancement in the time-frequency domain. This paper builds on those findings to improve upon a recently published streaming, array agnostic multi-channel enhancement system called Cleanformer. The proposed streaming enhancement system achieves competitive results against a non-causal state-of-the-art model on a source separation task, outperforming Cleanformer. Additionally, the presented model improves upon Cleanformer enhancement results in multiple challenging environments without introducing further latency. A short ablation study is performed to evaluate the influence of the proposed changes on the improved performance.

引用

页码：4473 / 4477

页数：5

共 5 条

[1] TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Wang Z.-Q.
Cornell S.
Choi S.
Lee Y.
Kim B.-Y.
Watanabe S.
IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3221 - 3236
[2] A Front-End for Emotional Speech Classification based on New Sub-Band Filters
Hosseini, Zeinab
Ahadi, Seyed Mohammad
2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 421 - 425
[3] A robust front-end processor combining mel frequency cepstral coefficient and sub-band spectral centroid histogram methods for automatic speech recognition
Department of Information Technology Kongu Engineering College, Perundurai - 638 052, Erode, Tamilnadu State, India
不详
Int. J. Signal Process. Image Process. Pattern Recogn., 2008, 2 (67-74):
[4] Power-efficient full-duplex K/Ka-band phased array front-end
Tabarani, Filipe
Boccia, Luigi
Calzona, Domenico
Amendola, Giandomenico
Schumacher, Hermann
IET MICROWAVES ANTENNAS & PROPAGATION, 2020, 14 (04) : 268 - 280
[5] Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems
Lu, Xugang
Unoki, Masashi
Akagi, Masato
ACOUSTICAL SCIENCE AND TECHNOLOGY, 2008, 29 (06) : 351 - 361

← 1 →