A novel frequency warping scale for speech emotion recognition

被引：0

作者：

Singh, Premjeet ^{[1
]}

Saha, Goutam ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect & Elect Commun Engn, Kharagpur, W Bengal, India

来源：

INTERSPEECH 2023 | 2023年

关键词：

Speech emotion recognition; Non-linear frequency warping; Constant-Q transform; CEPSTRAL COEFFICIENTS; FEATURES; MAGNITUDE;

D O I：

10.21437/Interspeech.2023-1600

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We investigate an optimised non-linear frequency warping scale for speech emotion recognition (SER). The proposed scale maps the speech spectrogram onto another time-frequency domain which is invariant to speaker-specific variations. Generally, the famous mel-scale designed on human audio perception is considered the de facto standard of frequency warping. However, designed mainly for speech recognition, the generalisability of mel on other speech processing tasks is debatable. Our experiments show that an emotion-specific scale designed on an SER database outperforms the standard mel-scale. Along with performance improvement, the proposed approach also provides insight into the emotion-relevant frequency regions for SER. Despite the database-dependent design of our approach, we find that the scale obtained from our experiments also shows SER performance improvement when tested on two other databases.

引用

页码：3647 / 3651

页数：5

共 50 条

[1] Non-linear frequency warping using constant-Q transformation for speech emotion recognition
Singh, Premjeet
Saha, Goutam
Sahidullah, Md
2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
[2] Fundamental Frequency Extraction in Speech Emotion Recognition
Stasiak, Bartlomiej
Rychlicki-Kicior, Krzysztof
MULTIMEDIA COMMUNICATIONS, SERVICES AND SECURITY, 2012, 287 : 292 - 303
[3] Frequency-warping invariant features for automatic speech recognition
Mertins, Alfred
Rademacher, Jan
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
[4] DYNAMIC FREQUENCY WARPING FOR SPEAKER ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
PALIWAL, KK
AINSWORTH, WA
JOURNAL OF PHONETICS, 1985, 13 (02) : 123 - 134
[5] Novel acoustic features for speech emotion recognition
Yong-Wan Roh
Dong-Ju Kim
Woo-Seok Lee
Kwang-Seok Hong
Science in China Series E: Technological Sciences, 2009, 52 : 1838 - 1848
[6] Novel acoustic features for speech emotion recognition
ROH Yong-Wan
KIM Dong-Ju
LEE Woo-Seok
HONG Kwang-Seok
Science in China(Series E:Technological Sciences), 2009, 52 (07) : 1838 - 1848
[7] Novel acoustic features for speech emotion recognition
Roh Yong-Wan
Kim Dong-Ju
Lee Woo-Seok
Hong Kwang-Seok
SCIENCE IN CHINA SERIES E-TECHNOLOGICAL SCIENCES, 2009, 52 (07): : 1838 - 1848
[8] Frequency warping approach for vocal tract length normalization in speech recognition
Xu, W
Wang, BX
Ding, Q
PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2, 2004, : 494 - 499
[9] Data Augmentation Based on Frequency Warping for Recognition of Cleft Palate Speech
Fujiwara, Kento
Takashima, Ryoichi
Sugiyama, Chihiro
Tanaka, Nobukazu
Nohara, Kanji
Nozaki, Kazunori
Takiguchi, Tetsuya
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 471 - 476
[10] On combining frequency warping and spectral shaping in HMM based speech recognition
Potamianos, A
Rose, RC
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1275 - 1278

← 1 2 3 4 5 →