Speech Separation and Recognition Using CASA Segmentation and Language-Based Grouping

被引：0

作者：

Karpukhin, Ivan ^{[1
,2
]}

Konushin, Anton ^{[1
]}

机构：

[1] Lomonosov Moscow State Univ, Fac Computat Math & Cybernet, Moscow 119991, Russia

[2] Yandex, Moscow 119021, Russia

来源：

ADVANCED SCIENCE LETTERS | 2018年 / 24卷 / 10期

关键词：

Speech Recognition; Monaural Speech Separation; Cocktail-Party Problem; CASA;

D O I：

10.1166/asl.2018.12994

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

We consider a monaural speech recognition problem in the case of multi-talker environment and difficult non-stationary noises. We propose a new method of computational auditory scene analysis (CASA) that uses a language model along with acoustic continuity for speech separation. Unlike previous works, our algorithm does not depend on a fixed set of speakers, so it could be used in a general-purpose speech recognition system. The algorithm works in two stages. First, it produces time-frequency signal segmentation. Then, a grouping stage composes segments into streams, with each stream corresponding to either speech or noise. In our approach, text recognition and separation are parts of a single process. Our experiments show 17% WER improvement over the baseline for a 0 dB environment.

引用

页码：7650 / 7654

页数：5

共 50 条

[21] Language-based feature extraction using template-matching in Farsi/Arabic handwritten numeral recognition
Ziaratban, Majid
Faez, Karim
Faradji, Farhad
[J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 297 - 301
[22] Speech/Non-Speech Segmentation Based on Phoneme Recognition Features
Janez Žibert
Nikola Pavešić
France Mihelič
[J]. EURASIP Journal on Advances in Signal Processing, 2006
[23] Speech/non-speech segmentation based on phoneme recognition features
Zibert, Janez
Pavesic, Nikola
Mihelic, France
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)
[24] A class based language model for speech recognition
Ward, W
Issar, S
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 416 - 418
[25] AUDIO SEGMENTATION FOR SPEECH RECOGNITION USING SEGMENT FEATURES
Rybach, David
Gollan, Christian
Schlueter, Ralf
Ney, Hermann
[J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4197 - 4200
[26] Words Similarities on Personalities: A Language-Based Generalization Approach for Personality Factors Recognition
Dos Santos, Adriano Madureira
Moura, Flavio Rafael Trindade
Pinto, Lyanh Vinicios Lopes
Alves, Andre Vinicius Neves
Figueiredo, Karla
Costa, Fernando Augusto Ribeiro
Seruffo, Marcos Cesar Da Rocha
[J]. IEEE ACCESS, 2023, 11 : 29823 - 29836
[27] Segmentation of Lecture Videos based on Spontaneous Speech Recognition
Repp, Stephan
Meinel, Christoph
[J]. ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 692 - 697
[28] Cochannel Speech Separation Using Multi-pitch Estimation and Model Based Voiced Sequential Grouping
Li, Ming
Cao, Chuan
Wang, Di
Lu, Ping
Fu, Qiang
Yan, Yonghong
[J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 151 - 154
[29] Language-Based Syllogistic Reasoning Using Deep Neural Networks
Aghahadi, Zeinab
Talebpour, Alireza
[J]. COGNITIVE SEMANTICS, 2022, 8 (02) : 210 - 239
[30] Automatic generation of language-based tools using the LISA system
Henriques, PR
Pereira, MJV
Mernik, M
Lenic, M
Gray, J
Wu, H
[J]. IEE PROCEEDINGS-SOFTWARE, 2005, 152 (02): : 54 - 69

← 1 2 3 4 5 →