Comparing the influence of spectro-temporal integration in computational speech segregation

被引：2

作者：

Bentsen, Thomas ^{[1
]}

May, Tobias ^{[1
]}

Kressner, Abigail A. ^{[1
]}

Dau, Torsten ^{[1
]}

机构：

[1] Tech Univ Denmark, Hearing Syst Grp, DK-2800 Lyngby, Denmark

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

computational speech segregation; binary masks; supervised learning; spectro-temporal integration; INTELLIGIBILITY; NOISE; PERCEPTION; ALGORITHM; MASKING;

D O I：

10.21437/Interspeech.2016-1025

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The goal of computational speech segregation systems is to automatically segregate a target speaker from interfering maskers. Typically, these systems include a feature extraction stage in the front-end and a classification stage in the back-end. A spectrotemporal integration strategy can be applied in either the frontend, using the so-called delta features, or in the back-end, using a second classifier that exploits the posterior probability of speech from the first classifier across a spectro-temporal window. This study systematically analyzes the influence of such stages on segregation performance, the error distributions and intelligibility predictions. Results indicated that it could be problematic to exploit context in the back-end, even though such a spectro-temporal integration stage improves the segregation performance. Also, the results emphasized the potential need of a single metric that comprehensively predicts computational segregation performance and correlates well with intelligibility. The outcome of this study could help to identify the most effective spectro-temporal integration strategy for computational segregation systems.

引用

页码：3324 / 3328

页数：5

共 50 条

[41] Hilbert Envelope Based Spectro-Temporal Features for Phoneme Recognition in Telephone Speech
Thomas, Samuel
Ganapathy, Sriram
Hermansky, Hynek
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1521 - +
[42] Informational masking of speech depends on masker spectro-temporal variation but not on its coherence
Roberts, Brian
Summers, Robert J.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (04): : 2416 - 2428
[43] Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure
Ding, Nai
Chatterjee, Monita
Simon, Jonathan Z.
NEUROIMAGE, 2014, 88 : 41 - 46
[44] Feature Adaptation Using Linear Spectro-Temporal Transform for Robust Speech Recognition
Duc Hoang Ha Nguyen
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (06) : 1006 - 1019
[45] Discriminating coherence in spectro-temporal patterns
J Acoust Soc Am, 6 (3782):
[46] Exponential spectro-temporal modulation generation
Stavropoulos, Trevor A.
Isarangura, Sittiprapa
Hoover, Eric C.
Eddins, David A.
Seitz, Aaron R.
Gallun, Frederick J.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2021, 149 (03): : 1434 - 1443
[47] Spectro-temporal acoustical markers differentiate speech from song across cultures
Albouy, Philippe
Mehr, Samuel A.
Hoyer, Roxane S.
Ginzburg, Jeremie
Du, Yi
Zatorre, Robert J.
NATURE COMMUNICATIONS, 2024, 15 (01)
[48] Informative Spectro-Temporal Bottleneck Features for Noise-Robust Speech Recognition
Chang, Shuo-Yiin
Morgan, Nelson
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 99 - 103
[49] Analysis of Spectro-Temporal Modulation Representation for Deep-Fake Speech Detection
Cheng, Haowei
Mawalim, Candy Olivia
Li, Kai
Wang, Lijun
Unoki, Masashi
2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1822 - 1829
[50] A clustering based feature selection method in spectro-temporal domain for speech recognition
Esfandian, Nafiseh
Razzazi, Farbod
Behrad, Alireza
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (06) : 1194 - 1202

← 1 2 3 4 5 →