Speech Separation and Recognition Using CASA Segmentation and Language-Based Grouping

被引:0
|
作者
Karpukhin, Ivan [1 ,2 ]
Konushin, Anton [1 ]
机构
[1] Lomonosov Moscow State Univ, Fac Computat Math & Cybernet, Moscow 119991, Russia
[2] Yandex, Moscow 119021, Russia
关键词
Speech Recognition; Monaural Speech Separation; Cocktail-Party Problem; CASA;
D O I
10.1166/asl.2018.12994
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider a monaural speech recognition problem in the case of multi-talker environment and difficult non-stationary noises. We propose a new method of computational auditory scene analysis (CASA) that uses a language model along with acoustic continuity for speech separation. Unlike previous works, our algorithm does not depend on a fixed set of speakers, so it could be used in a general-purpose speech recognition system. The algorithm works in two stages. First, it produces time-frequency signal segmentation. Then, a grouping stage composes segments into streams, with each stream corresponding to either speech or noise. In our approach, text recognition and separation are parts of a single process. Our experiments show 17% WER improvement over the baseline for a 0 dB environment.
引用
收藏
页码:7650 / 7654
页数:5
相关论文
共 50 条
  • [21] Language-based feature extraction using template-matching in Farsi/Arabic handwritten numeral recognition
    Ziaratban, Majid
    Faez, Karim
    Faradji, Farhad
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 297 - 301
  • [22] Speech/Non-Speech Segmentation Based on Phoneme Recognition Features
    Janez Žibert
    Nikola Pavešić
    France Mihelič
    [J]. EURASIP Journal on Advances in Signal Processing, 2006
  • [23] Speech/non-speech segmentation based on phoneme recognition features
    Zibert, Janez
    Pavesic, Nikola
    Mihelic, France
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)
  • [24] A class based language model for speech recognition
    Ward, W
    Issar, S
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 416 - 418
  • [25] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
    Rybach, David
    Gollan, Christian
    Schlueter, Ralf
    Ney, Hermann
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
  • [26] Words Similarities on Personalities: A Language-Based Generalization Approach for Personality Factors Recognition
    Dos Santos, Adriano Madureira
    Moura, Flavio Rafael Trindade
    Pinto, Lyanh Vinicios Lopes
    Alves, Andre Vinicius Neves
    Figueiredo, Karla
    Costa, Fernando Augusto Ribeiro
    Seruffo, Marcos Cesar Da Rocha
    [J]. IEEE ACCESS, 2023, 11 : 29823 - 29836
  • [27] Segmentation of Lecture Videos based on Spontaneous Speech Recognition
    Repp, Stephan
    Meinel, Christoph
    [J]. ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 692 - 697
  • [28] Cochannel Speech Separation Using Multi-pitch Estimation and Model Based Voiced Sequential Grouping
    Li, Ming
    Cao, Chuan
    Wang, Di
    Lu, Ping
    Fu, Qiang
    Yan, Yonghong
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 151 - 154
  • [29] Language-Based Syllogistic Reasoning Using Deep Neural Networks
    Aghahadi, Zeinab
    Talebpour, Alireza
    [J]. COGNITIVE SEMANTICS, 2022, 8 (02) : 210 - 239
  • [30] Automatic generation of language-based tools using the LISA system
    Henriques, PR
    Pereira, MJV
    Mernik, M
    Lenic, M
    Gray, J
    Wu, H
    [J]. IEE PROCEEDINGS-SOFTWARE, 2005, 152 (02): : 54 - 69