Investigating Speech Enhancement and Perceptual Quality for Speech Emotion Recognition

被引：10

作者：

Avila, Anderson R. ^{[1
,2
]}

Alam, Jahangir ^{[2
]}

O'Shaughnessy, Douglas ^{[1
]}

Falk, Tiago H. ^{[1
]}

机构：

[1] Univ Quebec, INRS EMT, Ste Foy, PQ, Canada

[2] CRIM, Montreal, PQ, Canada

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

基金：

加拿大自然科学与工程研究理事会;

关键词：

speech recognition; human-computer interaction; computational paralinguistics;

D O I：

10.21437/Interspeech.2018-2350

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this study, the performance of two enhancement algorithms is investigated in terms of perceptual quality as well as in respect to their impact on speech emotion recognition (SER). The SER system adopted is based on the same benchmark system provided for the AVEC Challenge 2016. The three objective measures adopted are the speech-to-reverberation modulation energy ratio (SRMR), the perceptual evaluation of speech quality (PESQ) and the perceptual objective listening quality assessment (POLQA). Evaluations are conducted on speech files from the RECOLA dataset, which provides spontaneous interactions in French of 27 subjects. Clean speech files are corrupted with different levels of background noise and reverberation. Results show that applying enhancement prior to the SER task can improve SER performance in more degraded scenarios. We also show that quality measures can be an important asset as indicator of enhancement algorithms performance towards SER, with SRMR and POLQA providing the most reliable results.

引用

页码：3663 / 3667

页数：5

共 50 条

[31] NETWORKS FOR SPEECH ENHANCEMENT AND AUTOMATIC SPEECH RECOGNITION
Vu, Thanh T.
Bigot, Benjamin
Chng, Eng Siong
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 499 - 503
[32] β-Masking MMSE Speech Enhancement for Speech Recognition
You, Chang Huai
Ma, Bin
[J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2017, : 341 - 345
[33] SPEECH ENHANCEMENT FOR TELEPHONY NAME SPEECH RECOGNITION
You, Chang Huai
Rahardja, Susanto
Li, Haizhou
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 973 - 976
[34] Noisy speech recognition based on speech enhancement
Wang, Xia
Tang, Hongmei
Zhao, Xiaoqun
[J]. SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 3, PROCEEDINGS, 2007, : 713 - +
[35] MODIFICATION ON LSA SPEECH ENHANCEMENT FOR SPEECH RECOGNITION
You, Chang Huai
Ma, Bin
Ni, Chongjia
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5475 - 5479
[36] The Impact of Face Mask and Emotion on Automatic Speech Recognition (ASR) and Speech Emotion Recognition (SER)
Oh, Qi Qi
Seow, Chee Kiat
Yusuff, Mulliana
Pranata, Sugiri
Cao, Qi
[J]. 2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 523 - 531
[37] Speech Emotion Recognition Based on Minimal Voice Quality Features
Jacob, Agnes
[J]. 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), VOL. 1, 2016, : 886 - 890
[38] Adaptive Filter for Perceptual Speech Enhancement
Alaya, Sana
Zoghlami, Novlene
Lachiri, Zied
[J]. 2015 IEEE 12TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2015,
[39] Speech emotion recognition based on emotion perception
Gang Liu
Shifang Cai
Ce Wang
[J]. EURASIP Journal on Audio, Speech, and Music Processing, 2023
[40] Autoencoder With Emotion Embedding for Speech Emotion Recognition
Zhang, Chenghao
Xue, Lei
[J]. IEEE ACCESS, 2021, 9 : 51231 - 51241

← 1 2 3 4 5 →