Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

被引:0
|
作者
Kristjansson, T. [1 ]
Hershey, J. [1 ]
Olsen, P. [1 ]
Rennie, S. [1 ]
Gopinath, R. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech separation; Algonquin; Iroquois;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.
引用
下载
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [21] SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition
    Raj, Desh
    Povey, Daniel
    Khudanpur, Sanjeev
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3800 - 3813
  • [22] Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
    Chang, Xuankai
    Qian, Yanmin
    Yu, Dong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1586 - 1590
  • [23] Real-Time Speech Recognition in a Multi-talker Reverberated Acoustic Scenario
    Rotili, Rudy
    Principi, Emanuele
    Squartini, Stefano
    Schuller, Bjoern
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 379 - +
  • [24] Recognizing Multi-talker Speech with Permutation Invariant Training
    Yu, Dong
    Chang, Xuankai
    Qian, Yanmin
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2456 - 2460
  • [25] Auditory masking of speech in reverberant multi-talker environments
    Weller, Tobias
    Buchholz, Joerg M.
    Best, Virginia
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 139 (03): : 1303 - 1313
  • [26] STREAMING NOISE CONTEXT AWARE ENHANCEMENT FOR AUTOMATIC SPEECH RECOGNITION IN MULTI-TALKER ENVIRONMENTS
    Caroselli, Joe
    Narayanan, Arun
    Huang, Yiteng
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,
  • [27] Unsupervised Domain Adaptation on End-to-End Multi-Talker Overlapped Speech Recognition
    Zheng, Lin
    Zhu, Han
    Tian, Sanli
    Zhao, Qingwei
    Li, Ta
    IEEE Signal Processing Letters, 2024, 31 : 3119 - 3123
  • [28] Multi-Channel Speaker Verification for Single and Multi-talker Speech
    Kataria, Saurabh
    Zhang, Shi-Xiong
    Yu, Dong
    INTERSPEECH 2021, 2021, : 4608 - 4612
  • [29] Which Ones Are Speaking? Speaker-inferred Model for Multi-talker Speech Separation
    Shi, Jing
    Xu, Jiaming
    Xu, Bo
    INTERSPEECH 2019, 2019, : 4609 - 4613
  • [30] Supervised Single-Microphone Multi-Talker Speech Separation with Conditional Random Fields
    Yeung, Yu Ting
    Lee, Tan
    Leung, Cheung-Chi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) : 2334 - 2342