Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

被引：0

作者：

Kristjansson, T. ^{[1
]}

Hershey, J. ^{[1
]}

Olsen, P. ^{[1
]}

Rennie, S. ^{[1
]}

Gopinath, R. ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech separation; Algonquin; Iroquois;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.

引用

下载

页码：97 / 100

页数：4

共 50 条

[31] The Impact of Speech-Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Environments
Frissen, Ilja
Scherzer, Johannes
Yao, Hsin-Yun
ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (06) : 1286 - 1290
[32] Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model
Kocour, Martin
Zmolikova, Katerina
Ondel, Lucas
Svec, Jan
Delcroix, Marc
Ochiai, Tsubasa
Burget, Lukas
Cernocky, Jan Honza
INTERSPEECH 2022, 2022, : 4955 - 4959
[33] ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION
Chang, Xuankai
Qian, Yanmin
Yu, Dong
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5974 - 5978
[34] Improving End-to-End Single-Channel Multi-Talker Speech Recognition
Zhang, Wangyou
Chang, Xuankai
Qian, Yanmin
Watanabe, Shinji
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1385 - 1394
[35] The Concurrent OLSA Test: A Method for Speech Recognition in Multi-talker Situations at Fixed SNR
Heeren, Jan
Nuesse, Theresa
Latzel, Matthias
Holube, Inga
Hohmann, Volker
Wagener, Kirsten C.
Schulte, Michael
TRENDS IN HEARING, 2022, 26
[36] Separation of speech signal - To realize multiple talker speech recognition
Makino, Shoji
Mukai, Ryo
Araki, Shoko
Katagiri, Shigeru
NTT R and D, 2001, 50 (12): : 937 - 944
[37] Super-Human Performance in Online Low-latency Recognition of Conversational Speech
Thai-Son Nguyen
Stueker, Sebastian
Waibel, Alex
INTERSPEECH 2021, 2021, : 1762 - 1766
[38] ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES
Kanda, Naoyuki
Fujita, Yusuke
Horiguchi, Shota
Ikeshita, Rintaro
Nagamatsu, Kenji
Watanabe, Shinji
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6630 - 6634
[39] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
Yul, Dang
Kalbcek, Marten
Tan, Zheng-Hua
Jensen, Jesper
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245
[40] A two-stage phase-aware approach for monaural multi-talker speech separation
Yin L.
Li J.
Yan Y.
Akagi M.
IEICE Transactions on Information and Systems, 2020, E103.D (07): : 1732 - 1743

← 1 2 3 4 5 →