Speech Separation and Recognition Using CASA Segmentation and Language-Based Grouping

被引:0
|
作者
Karpukhin, Ivan [1 ,2 ]
Konushin, Anton [1 ]
机构
[1] Lomonosov Moscow State Univ, Fac Computat Math & Cybernet, Moscow 119991, Russia
[2] Yandex, Moscow 119021, Russia
关键词
Speech Recognition; Monaural Speech Separation; Cocktail-Party Problem; CASA;
D O I
10.1166/asl.2018.12994
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We consider a monaural speech recognition problem in the case of multi-talker environment and difficult non-stationary noises. We propose a new method of computational auditory scene analysis (CASA) that uses a language model along with acoustic continuity for speech separation. Unlike previous works, our algorithm does not depend on a fixed set of speakers, so it could be used in a general-purpose speech recognition system. The algorithm works in two stages. First, it produces time-frequency signal segmentation. Then, a grouping stage composes segments into streams, with each stream corresponding to either speech or noise. In our approach, text recognition and separation are parts of a single process. Our experiments show 17% WER improvement over the baseline for a 0 dB environment.
引用
收藏
页码:7650 / 7654
页数:5
相关论文
共 50 条
  • [1] CASA Based Speech Separation for Robust Speech Recognition
    Han Runqiang
    Zhao Pei
    Gao Qin
    Zhang Zhiping
    Wu Hao
    Wu Xihong
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 77 - 80
  • [2] Monaural speech separation based on MAXVQ and CASA for robust speech recognition
    Li, Peng
    Guan, Yong
    Wang, Shijin
    Xu, Bo
    Liu, Wenju
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 30 - 44
  • [3] Emotion Recognition and Conversion Based on Segmentation of Speech in Hindi Language
    Agarwal, Archana
    Dev, Amita
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1843 - 1847
  • [4] Leveraging Pretrained Image Classifiers for Language-Based Segmentation
    Golub, David
    El-Kishky, Ahmed
    Martin-Martin, Roberto
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1999 - 2008
  • [5] SPEECH SEPARATION BASED ON THE IMAGES ANALYSIS METHOD IN CASA
    Lin, Jie
    Fu, Bo
    [J]. 2012 INTERNATIONAL CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (LCWAMTIP), 2012, : 33 - 36
  • [6] Effect of vocal tract dynamics on neural network-based speech recognition: A Bengali language-based study
    Hasan, Md Rakibul
    Hasan, Md Mahbub
    Hossain, Md Zakir
    [J]. EXPERT SYSTEMS, 2022, 39 (09)
  • [7] Effect of vocal tract dynamics on neural network-based speech recognition: A Bengali language-based study
    Hasan, Md Rakibul
    Hasan, Md Mahbub
    Hossain, Md Zakir
    [J]. Expert Systems, 2022, 39 (09):
  • [8] Jane Austen's Speech Acts and Language-Based Societies
    Nolan-Grant, Candace
    [J]. STUDIES IN ENGLISH LITERATURE 1500-1900, 2009, 49 (04): : 863 - +
  • [9] Sublime: A speech- and language-based information management environment
    Sherwani, Jahanzeb
    Tomko, Stefanie
    Rosenfeld, Roni
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 629 - 632
  • [10] LOCALIZATION BASED SEQUENTIAL GROUPING FOR CONTINUOUS SPEECH SEPARATION
    Wang, Zhong-Qiu
    Wang, DeLiang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 281 - 285