Noise-Aware Extended U-Net With Split Encoder and Feature Refinement Module for Robust Speaker Verification in Noisy Environments

被引:0
|
作者
Lim, Chan-Yeong [1 ]
Heo, Jungwoo [1 ]
Kim, Ju-Ho [1 ]
Shin, Hyun-Seo [1 ]
Yu, Ha-Jin [1 ]
机构
[1] Univ Seoul, Sch Comp Sci, Seoul 02504, South Korea
来源
IEEE ACCESS | 2024年 / 12卷
基金
新加坡国家研究基金会;
关键词
Noise measurement; Feature extraction; Training; Decoding; Speech enhancement; Convolution; Noise-aware extended U-Net; split encoder; feature refinement; feature enhancement; joint training; noisy environments; speaker verification;
D O I
10.1109/ACCESS.2024.3433465
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech data gathered from real-world environments typically contain noise, a significant element that undermines the performance of deep neural network-based speaker verification (SV) systems. To mitigate performance degradation due to noise and develop noise-robust SV systems, several researchers have integrated speech enhancement (SE) and SV systems. We previously proposed the extended U-Net (ExU-Net), which achieved state-of-the-art performance in SV in noisy environments by jointly training SE and SV systems. In the SE field, some studies have shown that recognizing noise components within speech can improve the system's performance. Inspired by these approaches, we propose a noise-aware ExU-Net (NA-ExU-Net) that acknowledges noise information in the SE process based on the ExU-Net architecture. The proposed system comprises a Split Encoder and a feature refinement module (FRM). The Split Encoder handles the speech and noise separately by dividing the encoder blocks, whereas FRM is designed to inhibit the propagation of irrelevant data via skip connections. To validate the effectiveness of our proposed framework in noisy conditions, we evaluated the models on the VoxCeleb1 test set with added noise from the MUSAN corpus. The experimental results demonstrate that NA-ExU-Net outperforms the ExU-Net and other baseline systems under all evaluation conditions. Furthermore, evaluations in out-of-domain noise environments indicate that NA-ExU-Net significantly surpasses existing frameworks, highlighting its robustness and generalization capabilities. The codes utilized in our experiments can be accessed at https://github.com/chan-yeong0519/NA-ExU-Net.
引用
收藏
页码:111673 / 111682
页数:10
相关论文
共 4 条
  • [1] Extended U-Net for Speaker Verification in Noisy Environments
    Kim, Ju-ho
    Heo, Jungwoo
    Shim, Hye-jin
    Yu, Ha-Jin
    INTERSPEECH 2022, 2022, : 590 - 594
  • [2] Improving Speaker Verification With Noise-Aware Label Ensembling and Sample Selection: Learning and Correcting Noisy Speaker Labels
    Fang, Zhihua
    He, Liang
    Li, Lin
    Hu, Ying
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2988 - 3001
  • [3] Attention U-Net with Feature Fusion Module for Robust Defect Detection
    Xiong, Yu-Jie
    Gao, Yong-Bin
    Wu, Hong
    Yao, Yao
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2021, 30 (15)
  • [4] An EffcientNet-encoder U-Net Joint Residual Refinement Module with Tversky-Kahneman Baroni-Urbani-Buser loss for biomedical image Segmentation
    Nham, Do-Hai-Ninh
    Trinh, Minh-Nhat
    Nguyen, Viet-Dung
    Pham, Van-Truong
    Tran, Thi-Thao
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 83