Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

被引:0
|
作者
Kristjansson, T. [1 ]
Hershey, J. [1 ]
Olsen, P. [1 ]
Rennie, S. [1 ]
Gopinath, R. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech separation; Algonquin; Iroquois;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.
引用
下载
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [1] Super-human multi-talker speech recognition: A graphical modeling approach
    Hershey, John R.
    Rennie, Steven J.
    Olsen, Peder A.
    Kristjansson, Trausti T.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 45 - 66
  • [2] MULTI-MICROPHONE NEURAL SPEECH SEPARATION FOR FAR-FIELD MULTI-TALKER SPEECH RECOGNITION
    Yoshioka, Takuya
    Erdogan, Hakan
    Chen, Zhuo
    Alleva, Fil
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5739 - 5743
  • [3] Monaural multi-talker speech recognition using factorial speech processing models
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    SPEECH COMMUNICATION, 2018, 98 : 1 - 16
  • [4] Streaming Multi-talker Speech Recognition with Joint Speaker Identification
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    INTERSPEECH 2021, 2021, : 1782 - 1786
  • [5] A microphone array beamforming-based system for multi-talker speech separation
    Hidri, Adel
    Amiri, Hamid
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2016, 9 (4-5) : 209 - 217
  • [6] Modeling speech localization, talker identification, and word recognition in a multi-talker setting
    Josupeit, Angela
    Hohmann, Volker
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (01): : 35 - 54
  • [7] Streaming End-to-End Multi-Talker Speech Recognition
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
  • [8] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
    Tripathi, Anshuman
    Lu, Han
    Sak, Hasim
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
  • [9] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
  • [10] Speech-derived haptic stimulation enhances speech recognition in a multi-talker background
    I. Sabina Răutu
    Xavier De Tiège
    Veikko Jousmäki
    Mathieu Bourguignon
    Julie Bertels
    Scientific Reports, 13