Speech Separation Algorithm Using Gated Recurrent Network Based on Microphone Array

被引:0
|
作者
Zhao, Xiaoyan [1 ]
Zhou, Lin [2 ]
Xie, Yue [1 ]
Tong, Ying [1 ]
Shi, Jingang [3 ]
机构
[1] Nanjing Inst Technol, Sch Informat & Commun Engn, Nanjing 211167, Peoples R China
[2] Southeast Univ, Sch Informat Sci & Engn, Nanjing 210096, Peoples R China
[3] Univ Oulu, FI-90014 Oulu, Finland
来源
关键词
Microphone array; speech separation; gate recurrent unit; network; gammatone sub -band steered response power -phase transform; spatial spectrum; CLASSIFICATION;
D O I
10.32604/iasc.2023.030180
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech separation is an active research topic that plays an important role in numerous applications, such as speaker recognition, hearing prosthesis, and autonomous robots. Many algorithms have been put forward to improve separation performance. However, speech separation in reverberant noisy environment is still a challenging task. To address this, a novel speech separation algorithm using gate recurrent unit (GRU) network based on microphone array has been proposed in this paper. The main aim of the proposed algorithm is to improve the separation performance and reduce the computational cost. The proposed algorithm extracts the sub-band steered response power-phase transform (SRP-PHAT) weighted by gammatone filter as the speech separation feature due to its discriminative and robust spatial position information. Since the GRU network has the advantage of processing time series data with faster training speed and fewer training parameters, the GRU model is adopted to process the separation features of several sequential frames in the same sub-band to estimate the ideal Ratio Masking (IRM). The proposed algorithm decomposes the mixture signals into time-frequency (TF) units using gammatone filter bank in the frequency domain, and the target speech is reconstructed in the frequency domain by masking the mixture signal according to the estimated IRM. The operations of decomposing the mixture signal and reconstructing the target signal are completed in the frequency domain which can reduce the total computational cost. Experimental results demonstrate that the proposed algorithm realizes omnidirectional speech separation in noisy and reverberant environments, provides good performance in terms of speech quality and intelligibility, and has the generalization capacity to reverberate.
引用
收藏
页码:3086 / 3099
页数:14
相关论文
共 50 条
  • [1] Microphone Array Speech Separation Algorithm based on DNN
    Wu, Chaoyan
    Zhou, Lin
    Chen, Xijin
    Chen, Liyuan
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1305 - 1310
  • [2] Microphone Array Speech Separation Algorithm Based on TC-ResNet
    Zhou, Lin
    Xu, Yue
    Wang, Tianyi
    Feng, Kun
    Shi, Jingang
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (02): : 2705 - 2716
  • [3] Blind Separation Algorithm for Microphone Array Based on Recursive Neural Network
    Fan Jiajun
    Fu Yuzhuo
    Liu Ting
    [J]. 2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS, 2009, : 937 - 940
  • [4] Distant Speech Recognition Using a Microphone Array Network
    Nakano, Alberto Yoshihiro
    Nakagawa, Seiichi
    Yamamoto, Kazumasa
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2451 - 2462
  • [5] Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array
    Asano, Futoshi
    Yamamoto, Kiyoshi
    Ogata, Jun
    Yamada, Miichi
    Nakamura, Andmasami
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2007, 2007 (1)
  • [6] Detection and Separation of Speech Events in Meeting Recordings Using a Microphone Array
    Futoshi Asano
    Kiyoshi Yamamoto
    Jun Ogata
    Miichi Yamada
    Masami Nakamura
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2007
  • [7] An improved noise reduction algorithm for speech signals using a microphone array
    Van Binh Truong
    Due Minh Nguyen
    Quang Hieu Dang
    [J]. 2014 IEEE FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2014, : 472 - 477
  • [8] A subband adaptive learning algorithm for microphone array based speech enhancement
    Wang, DX
    Yin, FL
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2005, PT 2, PROCEEDINGS, 2005, 3497 : 592 - 597
  • [9] Speech detection using microphone array
    Chen, JF
    Ser, W
    [J]. ELECTRONICS LETTERS, 2000, 36 (02) : 181 - 182
  • [10] Microphone array beamforming approach to blind speech separation
    Himawan, Ivan
    McCowan, Iain
    Lincoln, Mike
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2008, 4892 : 295 - +