Comparing the influence of spectro-temporal integration in computational speech segregation

被引:2
|
作者
Bentsen, Thomas [1 ]
May, Tobias [1 ]
Kressner, Abigail A. [1 ]
Dau, Torsten [1 ]
机构
[1] Tech Univ Denmark, Hearing Syst Grp, DK-2800 Lyngby, Denmark
关键词
computational speech segregation; binary masks; supervised learning; spectro-temporal integration; INTELLIGIBILITY; NOISE; PERCEPTION; ALGORITHM; MASKING;
D O I
10.21437/Interspeech.2016-1025
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation performance, the error distributions and intelligibility predictions. Results indicated that it could be problematic to exploit context in the back-end, even though such a spectro-temporal integration stage improves the segregation performance. Also, the results emphasized the potential need of a single metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems.
引用
收藏
页码:3324 / 3328
页数:5
相关论文
共 50 条
  • [21] DERIVING SPECTRO-TEMPORAL PROPERTIES OF HEARING FROM SPEECH DATA
    Ondel, Lucas
    Li, Ruizhi
    Sell, Gregory
    Hermansky, Hynek
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 411 - 415
  • [22] Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
    Geng, Mengzhe
    Liu, Shansong
    Yu, Jianwei
    Xie, Xurong
    Hu, Shoukang
    Ye, Zi
    Jin, Zengrui
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4793 - 4797
  • [23] Methods for capturing spectro-temporal modulations in automatic speech recognition
    Kleinschmidt, M
    ACTA ACUSTICA UNITED WITH ACUSTICA, 2002, 88 (03) : 416 - 422
  • [24] A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction
    Edraki, Amin
    Chan, Wai-Yip
    Jensen, Jesper
    Fogerty, Daniel
    INTERSPEECH 2021, 2021, : 206 - 210
  • [25] Bioinspired sparse spectro-temporal representation of speech for robust classification
    Martinez, C.
    Goddard, J.
    Milone, D.
    Rufiner, H.
    COMPUTER SPEECH AND LANGUAGE, 2012, 26 (05): : 336 - 348
  • [26] Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis
    Edraki, Amin
    Chan, Wai-Yip
    Jensen, Jesper
    Fogerty, Daniel
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 210 - 225
  • [27] Spectro-Temporal Directional Derivative Features for Automatic Speech Recognition
    Gibson, James
    Van Segbroeck, Maarten
    Ortega, Antonio
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 872 - 875
  • [28] DeepCNN: Spectro-temporal feature representation for speech emotion recognition
    Saleem, Nasir
    Gao, Jiechao
    Irfan, Rizwana
    Almadhor, Ahmad
    Rauf, Hafiz Tayyab
    Zhang, Yudong
    Kadry, Seifedine
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 401 - 417
  • [29] Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
    Esfandian, N.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (01): : 105 - 111
  • [30] Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation
    Edraki, Amin
    Chan, Wai-Yip
    Jensen, Jesper
    Fogerty, Daniel
    INTERSPEECH 2019, 2019, : 1378 - 1382