Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

被引：0

作者：

Kristjansson, T. ^{[1
]}

Hershey, J. ^{[1
]}

Olsen, P. ^{[1
]}

Rennie, S. ^{[1
]}

Gopinath, R. ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech separation; Algonquin; Iroquois;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.

引用

下载

页码：97 / 100

页数：4

共 50 条

[41] KNOWLEDGE TRANSFER IN PERMUTATION INVARIANT TRAINING FOR SINGLE-CHANNEL MULTI-TALKER SPEECH RECOGNITION
Tan, Tian
Qian, Yanmin
Yu, Dong
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5714 - 5718
[42] A VISUAL-PILOT DEEP FUSION FOR TARGET SPEECH SEPARATION IN MULTI-TALKER NOISY ENVIRONMENT
Li, Yun
Liu, Zhang
Na, Yueyue
Wang, Ziteng
Tian, Biao
Fu, Qiang
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4442 - 4446
[43] Permutation invariant training of deep models for speaker-independent multi-talker speech separation
Takahashi, Kohei
Shiraishi, Toshihiko
MECHANICAL ENGINEERING JOURNAL, 2023,
[44] A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
Yin, Lu
Li, Junfeng
Yan, Yonghong
Akagi, Masato
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (07): : 1732 - 1743
[45] Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception
O'Sullivan, James
Herrero, Jose
Smith, Elliot
Schevon, Catherine
McKhann, Guy M.
Sheth, Sameer A.
Mehta, Ashesh D.
Mesgarani, Nima
NEURON, 2019, 104 (06) : 1195 - +
[46] The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene
Rimmele, Johanna M.
Golumbic, Elana Zion
Schroeger, Erich
Poeppel, David
CORTEX, 2015, 68 : 144 - 154
[47] Multi-talker Speech Recognition Based on Blind Source Separation with Ad hoc Microphone Array Using Smartphones and Cloud Storage
Ochi, Keiko
Ono, Nobutaka
Miyabe, Shigeki
Makino, Shoji
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3369 - 3373
[48] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
Westermann, Adam
Buchholz, Joerg M.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
[49] Selective cortical representation of attended speaker in multi-talker speech perception
Nima Mesgarani
Edward F. Chang
Nature, 2012, 485 : 233 - 236
[50] Speaker Identification in Multi-Talker Overlapping Speech Using Neural Networks
Tran, Van-Thuan
Tsai, Wei-Ho
IEEE ACCESS, 2020, 8 : 134868 - 134879

← 1 2 3 4 5 →