Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

被引：0

作者：

Kristjansson, T. ^{[1
]}

Hershey, J. ^{[1
]}

Olsen, P. ^{[1
]}

Rennie, S. ^{[1
]}

Gopinath, R. ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 | 2006年

关键词：

speech separation; Algonquin; Iroquois;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.

引用

下载

页码：97 / 100

页数：4

共 50 条

[1] Super-human multi-talker speech recognition: A graphical modeling approach
Hershey, John R.
Rennie, Steven J.
Olsen, Peder A.
Kristjansson, Trausti T.
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 45 - 66
[2] MULTI-MICROPHONE NEURAL SPEECH SEPARATION FOR FAR-FIELD MULTI-TALKER SPEECH RECOGNITION
Yoshioka, Takuya
Erdogan, Hakan
Chen, Zhuo
Alleva, Fil
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5739 - 5743
[3] Monaural multi-talker speech recognition using factorial speech processing models
Khademian, Mahdi
Homayounpour, Mohammad Mehdi
SPEECH COMMUNICATION, 2018, 98 : 1 - 16
[4] Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Lu, Liang
Kanda, Naoyuki
Li, Jinyu
Gong, Yifan
INTERSPEECH 2021, 2021, : 1782 - 1786
[5] A microphone array beamforming-based system for multi-talker speech separation
Hidri, Adel
Amiri, Hamid
INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2016, 9 (4-5) : 209 - 217
[6] Modeling speech localization, talker identification, and word recognition in a multi-talker setting
Josupeit, Angela
Hohmann, Volker
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (01): : 35 - 54
[7] Streaming End-to-End Multi-Talker Speech Recognition
Lu, Liang
Kanda, Naoyuki
Li, Jinyu
Gong, Yifan
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
[8] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
Tripathi, Anshuman
Lu, Han
Sak, Hasim
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
[9] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
Rennie, Steven J.
Hershey, John R.
Olsen, Peder A.
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
[10] Speech-derived haptic stimulation enhances speech recognition in a multi-talker background
I. Sabina Răutu
Xavier De Tiège
Veikko Jousmäki
Mathieu Bourguignon
Julie Bertels
Scientific Reports, 13

← 1 2 3 4 5 →