Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System

被引:0
|
作者
Kristjansson, T. [1 ]
Hershey, J. [1 ]
Olsen, P. [1 ]
Rennie, S. [1 ]
Gopinath, R. [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
speech separation; Algonquin; Iroquois;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with remarkable results. It incorporates a novel method for performing two-talker speaker identification and gain estimation. We extend the method of model based high resolution signal reconstruction to incorporate temporal dynamics. We report on two methods for introducing dynamics; the first uses dynamics in the acoustic model space, the second incorporates dynamics based on sentence grammar. The addition of temporal constraints leads to dramatic improvements in the separation performance. Once the signals have been separated they are then recognized using speaker dependent labeling.
引用
下载
收藏
页码:97 / 100
页数:4
相关论文
共 50 条
  • [31] The Impact of Speech-Irrelevant Head Movements on Speech Intelligibility in Multi-Talker Environments
    Frissen, Ilja
    Scherzer, Johannes
    Yao, Hsin-Yun
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2019, 105 (06) : 1286 - 1290
  • [32] Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model
    Kocour, Martin
    Zmolikova, Katerina
    Ondel, Lucas
    Svec, Jan
    Delcroix, Marc
    Ochiai, Tsubasa
    Burget, Lukas
    Cernocky, Jan Honza
    INTERSPEECH 2022, 2022, : 4955 - 4959
  • [33] ADAPTIVE PERMUTATION INVARIANT TRAINING WITH AUXILIARY INFORMATION FOR MONAURAL MULTI-TALKER SPEECH RECOGNITION
    Chang, Xuankai
    Qian, Yanmin
    Yu, Dong
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5974 - 5978
  • [34] Improving End-to-End Single-Channel Multi-Talker Speech Recognition
    Zhang, Wangyou
    Chang, Xuankai
    Qian, Yanmin
    Watanabe, Shinji
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1385 - 1394
  • [35] The Concurrent OLSA Test: A Method for Speech Recognition in Multi-talker Situations at Fixed SNR
    Heeren, Jan
    Nuesse, Theresa
    Latzel, Matthias
    Holube, Inga
    Hohmann, Volker
    Wagener, Kirsten C.
    Schulte, Michael
    TRENDS IN HEARING, 2022, 26
  • [36] Separation of speech signal - To realize multiple talker speech recognition
    Makino, Shoji
    Mukai, Ryo
    Araki, Shoko
    Katagiri, Shigeru
    NTT R and D, 2001, 50 (12): : 937 - 944
  • [37] Super-Human Performance in Online Low-latency Recognition of Conversational Speech
    Thai-Son Nguyen
    Stueker, Sebastian
    Waibel, Alex
    INTERSPEECH 2021, 2021, : 1762 - 1766
  • [38] ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES
    Kanda, Naoyuki
    Fujita, Yusuke
    Horiguchi, Shota
    Ikeshita, Rintaro
    Nagamatsu, Kenji
    Watanabe, Shinji
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6630 - 6634
  • [39] PERMUTATION INVARIANT TRAINING OF DEEP MODELS FOR SPEAKER-INDEPENDENT MULTI-TALKER SPEECH SEPARATION
    Yul, Dang
    Kalbcek, Marten
    Tan, Zheng-Hua
    Jensen, Jesper
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 241 - 245
  • [40] A two-stage phase-aware approach for monaural multi-talker speech separation
    Yin L.
    Li J.
    Yan Y.
    Akagi M.
    IEICE Transactions on Information and Systems, 2020, E103.D (07): : 1732 - 1743