MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION USING DEEP COMPLEX UNET

被引:8
|
作者
Kong, Yuxiang [1 ,2 ]
Wu, Jian [1 ]
Wang, Quandong [2 ]
Gao, Peng [2 ]
Zhuang, Weiji [2 ]
Wang, Yujun [2 ]
Xie, Lei [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Xiaomi Inc, Beijing, Peoples R China
关键词
Multi-channel speech recognition; robust speech recognition; deep learning; deep complex unet;
D O I
10.1109/SLT48900.2021.9383492
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The front-end module in multi-channel automatic speech recognition (ASR) systems mainly use microphone array techniques to produce enhanced signals in noisy conditions with reverberation and echos. Recently, neural network (NN) based front-end has shown promising improvement over the conventional signal processing methods. In this paper, we propose to adopt the architecture of deep complex Unet (DCUnet) - a powerful complex-valued Unet-structured speech enhancement model - as the front-end of the multi-channel acoustic model, and integrate them in a multi-task learning (MTL) framework along with cascaded framework for comparison. Meanwhile, we investigate the proposed methods with several training strategies to improve the recognition accuracy on the 1000-hours real-world XiaoMi smart speaker data with echos. Experiments show that our proposed DCUnet-MTL method brings about 12.2% relative character error rate (CER) reduction compared with the traditional approach with array processing plus single-channel acoustic model. It also achieves superior performance than the recently proposed neural beamforming method.
引用
收藏
页码:104 / 110
页数:7
相关论文
共 50 条
  • [1] UNet plus plus -Based Multi-Channel Speech Dereverberation and Distant Speech Recognition
    Zhao, Tuo
    Zhao, Yunxin
    Wang, Shaojun
    Han, Mei
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [2] DEEP BEAMFORMING NETWORKS FOR MULTI-CHANNEL SPEECH RECOGNITION
    Xiao, Xiong
    Watanabe, Shinji
    Erdogan, Hakan
    Lu, Liang
    Hershey, John
    Seltzer, Michael L.
    Chen, Guoguo
    Zhang, Yu
    Mandel, Michael
    Yu, Dong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5745 - 5749
  • [3] SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION
    Menne, Tobias
    Schlueter, Ralf
    Ney, Hermann
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 535 - 541
  • [4] The segmentation of multi-channel meeting recordings for automatic speech recognition
    Dines, John
    Vepa, Jithendra
    Hain, Thomas
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1213 - +
  • [5] THE ROYALFLUSH AUTOMATIC SPEECH DIARIZATION AND RECOGNITION SYSTEM FOR IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE
    Tian, Jingguang
    Ye, Shuaishuai
    Chen, Shunfei
    Xiang, Yang
    Yin, Zhaohui
    Hu, Xinhui
    Xu, Xinkang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 1 - 2
  • [6] PERFORMANCE MONITORING FOR AUTOMATIC SPEECH RECOGNITION IN NOISY MULTI-CHANNEL ENVIRONMENTS
    Meyerl, Bernd T.
    Mallidi, Sri Harish
    Martinez, Angel Mario Castro
    Paya-Vaya, Guillermo
    Kayser, Hendrik
    Hermansky, Hynek
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 50 - 56
  • [7] Robust automatic speech recognition using a multi-channel signal separation front-end
    Yen, KC
    Zhao, YX
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1337 - 1340
  • [8] Multi-Channel Speech Enhancement and Amplitude Modulation Analysis for Noise Robust Automatic Speech Recognition
    Moritz, Niko
    Adiloglu, Kamil
    Anemueller, Joern
    Goetze, Stefan
    Kollmeier, Birger
    COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 558 - 573
  • [9] Multi-Channel Transformer Transducer for Speech Recognition
    Chang, Feng-Ju
    Radfar, Martin
    Mouchtaris, Athanasios
    Omologo, Maurizio
    INTERSPEECH 2021, 2021, : 296 - 300
  • [10] Multi-channel underwater target recognition using deep learning
    Li, Chen
    Huang, Zhaoqiong
    Xu, Ji
    Guo, Xinyi
    Gong, Zaixiao
    Yan, Yonghong
    Yan, Yonghong (yanyonghong@hccl.ioa.ac.cn), 1600, Science Press (45): : 506 - 514