Modeling speech localization, talker identification, and word recognition in a multi-talker setting

被引:12
|
作者
Josupeit, Angela
Hohmann, Volker [1 ]
机构
[1] Carl von Ossietzky Univ Oldenburg, Med Phys, D-26111 Oldenburg, Germany
来源
关键词
FUNDAMENTAL-FREQUENCY; INTERAURAL TIME; PERCEPTION; NOISE; SEGREGATION; SOUND; CUES; INTELLIGIBILITY; ENVIRONMENT; SEPARATION;
D O I
10.1121/1.4990375
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study introduces a model for solving three different auditory tasks in a multi-talker setting: target localization, target identification, and word recognition. The model was used to simulate psychoacoustic data from a call-sign-based listening test involving multiple spatially separated talkers [ Brungart and Simpson (2007). Percept. Psychophys. 69(1), 79-91]. The main characteristics of the model are (i) the extraction of salient auditory features ("glimpses") from the multi-talker signal and (ii) the use of a classification method that finds the best target hypothesis by comparing feature templates from clean target signals to the glimpses derived from the multi-talker mixture. The four features used were periodicity, periodic energy, and periodicity-based interaural time and level differences. The model results widely exceeded probability of chance for all subtasks and conditions, and generally coincided strongly with the subject data. This indicates that, despite their sparsity, glimpses provide sufficient information about a complex auditory scene. This also suggests that complex source superposition models may not be needed for auditory scene analysis. Instead, simple models of clean speech may be sufficient to decode even complex multi-talker scenes. (C) 2017 Author(s).
引用
收藏
页码:35 / 54
页数:20
相关论文
共 50 条
  • [1] Streaming Multi-talker Speech Recognition with Joint Speaker Identification
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2021, 2021, : 1782 - 1786
  • [2] Super-human multi-talker speech recognition: A graphical modeling approach
    Hershey, John R.
    Rennie, Steven J.
    Olsen, Peder A.
    Kristjansson, Trausti T.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 45 - 66
  • [3] Streaming End-to-End Multi-Talker Speech Recognition
    Lu, Liang
    Kanda, Naoyuki
    Li, Jinyu
    Gong, Yifan
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
  • [4] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
    Tripathi, Anshuman
    Lu, Han
    Sak, Hasim
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
  • [5] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370
  • [6] Monaural multi-talker speech recognition using factorial speech processing models
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    [J]. SPEECH COMMUNICATION, 2018, 98 : 1 - 16
  • [7] Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    [J]. 2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 176 - 181
  • [8] Effects of face masks on speech recognition in multi-talker babble noise
    Toscano, Joseph C.
    Toscano, Cheyenne M.
    [J]. PLOS ONE, 2021, 16 (02):
  • [9] Audio-Visual Multi-Talker Speech Recognition in A Cocktail Party
    Wu, Yifei
    Hi, Chenda
    Yang, Song
    Wu, Zhongqin
    Qian, Yanmin
    [J]. INTERSPEECH 2021, 2021, : 3021 - 3025
  • [10] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    [J]. INTERSPEECH 2020, 2020, : 304 - 308