Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

被引:0
|
作者
Kristjansson, T. [1 ]
Hershey, J. [1 ]
Olsen, P. [1 ]
Rennie, S. [1 ]
Gopinath, R. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech separation; Algonquin; Iroquois;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.
引用
下载
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [41] KNOWLEDGE TRANSFER IN PERMUTATION INVARIANT TRAINING FOR SINGLE-CHANNEL MULTI-TALKER SPEECH RECOGNITION
    Tan, Tian
    Qian, Yanmin
    Yu, Dong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5714 - 5718
  • [42] A VISUAL-PILOT DEEP FUSION FOR TARGET SPEECH SEPARATION IN MULTI-TALKER NOISY ENVIRONMENT
    Li, Yun
    Liu, Zhang
    Na, Yueyue
    Wang, Ziteng
    Tian, Biao
    Fu, Qiang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4442 - 4446
  • [43] Permutation invariant training of deep models for speaker-independent multi-talker speech separation
    Takahashi, Kohei
    Shiraishi, Toshihiko
    MECHANICAL ENGINEERING JOURNAL, 2023,
  • [44] A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation
    Yin, Lu
    Li, Junfeng
    Yan, Yonghong
    Akagi, Masato
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (07): : 1732 - 1743
  • [45] Hierarchical Encoding of Attended Auditory Objects in Multi-talker Speech Perception
    O'Sullivan, James
    Herrero, Jose
    Smith, Elliot
    Schevon, Catherine
    McKhann, Guy M.
    Sheth, Sameer A.
    Mehta, Ashesh D.
    Mesgarani, Nima
    NEURON, 2019, 104 (06) : 1195 - +
  • [46] The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene
    Rimmele, Johanna M.
    Golumbic, Elana Zion
    Schroeger, Erich
    Poeppel, David
    CORTEX, 2015, 68 : 144 - 154
  • [47] Multi-talker Speech Recognition Based on Blind Source Separation with Ad hoc Microphone Array Using Smartphones and Cloud Storage
    Ochi, Keiko
    Ono, Nobutaka
    Miyabe, Shigeki
    Makino, Shoji
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3369 - 3373
  • [48] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
    Westermann, Adam
    Buchholz, Joerg M.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
  • [49] Selective cortical representation of attended speaker in multi-talker speech perception
    Nima Mesgarani
    Edward F. Chang
    Nature, 2012, 485 : 233 - 236
  • [50] Speaker Identification in Multi-Talker Overlapping Speech Using Neural Networks
    Tran, Van-Thuan
    Tsai, Wei-Ho
    IEEE ACCESS, 2020, 8 : 134868 - 134879