Comparing the influence of spectro-temporal integration in computational speech segregation

被引：2

作者：

Bentsen, Thomas ^{[1
]}

May, Tobias ^{[1
]}

Kressner, Abigail A. ^{[1
]}

Dau, Torsten ^{[1
]}

机构：

[1] Tech Univ Denmark, Hearing Syst Grp, DK-2800 Lyngby, Denmark

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

computational speech segregation; binary masks; supervised learning; spectro-temporal integration; INTELLIGIBILITY; NOISE; PERCEPTION; ALGORITHM; MASKING;

D O I：

10.21437/Interspeech.2016-1025

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation performance, the error distributions and intelligibility predictions. Results indicated that it could be problematic to exploit context in the back-end, even though such a spectro-temporal integration stage improves the segregation performance. Also, the results emphasized the potential need of a single metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems.

引用

页码：3324 / 3328

页数：5

共 50 条

[21] DERIVING SPECTRO-TEMPORAL PROPERTIES OF HEARING FROM SPEECH DATA
Ondel, Lucas
Li, Ruizhi
Sell, Gregory
Hermansky, Hynek
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 411 - 415
[22] Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition
Geng, Mengzhe
Liu, Shansong
Yu, Jianwei
Xie, Xurong
Hu, Shoukang
Ye, Zi
Jin, Zengrui
Liu, Xunying
Meng, Helen
INTERSPEECH 2021, 2021, : 4793 - 4797
[23] Methods for capturing spectro-temporal modulations in automatic speech recognition
Kleinschmidt, M
ACTA ACUSTICA UNITED WITH ACUSTICA, 2002, 88 (03) : 416 - 422
[24] A Spectro-Temporal Glimpsing Index (STGI) for Speech Intelligibility Prediction
Edraki, Amin
Chan, Wai-Yip
Jensen, Jesper
Fogerty, Daniel
INTERSPEECH 2021, 2021, : 206 - 210
[25] Bioinspired sparse spectro-temporal representation of speech for robust classification
Martinez, C.
Goddard, J.
Milone, D.
Rufiner, H.
COMPUTER SPEECH AND LANGUAGE, 2012, 26 (05): : 336 - 348
[26] Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis
Edraki, Amin
Chan, Wai-Yip
Jensen, Jesper
Fogerty, Daniel
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 210 - 225
[27] Spectro-Temporal Directional Derivative Features for Automatic Speech Recognition
Gibson, James
Van Segbroeck, Maarten
Ortega, Antonio
Georgiou, Panayiotis
Narayanan, Shrikanth
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 872 - 875
[28] DeepCNN: Spectro-temporal feature representation for speech emotion recognition
Saleem, Nasir
Gao, Jiechao
Irfan, Rizwana
Almadhor, Ahmad
Rauf, Hafiz Tayyab
Zhang, Yudong
Kadry, Seifedine
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (02) : 401 - 417
[29] Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
Esfandian, N.
INTERNATIONAL JOURNAL OF ENGINEERING, 2020, 33 (01): : 105 - 111
[30] Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation
Edraki, Amin
Chan, Wai-Yip
Jensen, Jesper
Fogerty, Daniel
INTERSPEECH 2019, 2019, : 1378 - 1382

← 1 2 3 4 5 →