A Cantonese Audio-Visual Emotional Speech (CAVES) dataset

被引：1

作者：

Chong C.S. ^{[1
]}

Davis C. ^{[1
]}

Kim J. ^{[1
]}

机构：

[1] The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Locked Bag 1797, Penrith, 2751, NSW

来源：

Behavior Research Methods | 2024年 / 56卷 / 5期

基金：

澳大利亚研究理事会;

关键词：

Auditory and visual expressions; Cantonese dataset; Dataset evaluation; Emotional speech;

D O I：

10.3758/s13428-023-02270-7

中图分类号：

学科分类号：

摘要：

We present a Cantonese emotional speech dataset that is suitable for use in research investigating the auditory and visual expression of emotion in tonal languages. This unique dataset consists of auditory and visual recordings of ten native speakers of Cantonese uttering 50 sentences each in the six basic emotions plus neutral (angry, happy, sad, surprise, fear, and disgust). The visual recordings have a full HD resolution of 1920 × 1080 pixels and were recorded at 50 fps. The important features of the dataset are outlined along with the factors considered when compiling the dataset. A validation study of the recorded emotion expressions was conducted in which 15 native Cantonese perceivers completed a forced-choice emotion identification task. The variability of the speakers and the sentences was examined by testing the degree of concordance between the intended and the perceived emotion. We compared these results with those of other emotion perception and evaluation studies that have tested spoken emotions in languages other than Cantonese. The dataset is freely available for research purposes. © 2023, The Author(s).

引用

页码：5264 / 5278

页数：14

共 50 条

[1] Author Correction: A Cantonese Audio-Visual Emotional Speech (CAVES) dataset
Chee Seng Chong
Chris Davis
Jeesun Kim
[J]. Behavior Research Methods, 2024, 56 (6) : 6410 - 6410
[2] A Cantonese Audio-Visual Emotional Speech (CAVES) dataset (Nov, 10.3758/s13428-023-02270-7, 2023)
Chong, Chee Seng
Davis, Chris
Kim, Jeesun
[J]. BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 6410 - 6410
[3] CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
Dai, Wenliang
Cahyawijaya, Samuel
Yu, Tiezheng
Barezi, Elham J.
Xu, Peng
Yiu, Cheuk Tung Shadow
Frieske, Rita
Lovenia, Holy
Winata, Genta Indra
Chen, Qifeng
Ma, Xiaojuan
Shi, Bertram E.
Fung, Pascale
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6786 - 6793
[4] EMID: An Emotional Aligned Dataset in Audio-Visual Modality
Zou, Jialing
Mei, Jiahao
Ye, Guangze
Huai, Tianyu
Shen, Qiwei
Dong, Daoguo
[J]. PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT GENERATION AND EVALUATION, MCGE 2023: New Methods and Practice, 2023, : 41 - 48
[5] Integrative interaction of emotional speech in audio-visual modality
Dong, Haibin
Li, Na
Fan, Lingzhong
Wei, Jianguo
Xu, Junhai
[J]. FRONTIERS IN NEUROSCIENCE, 2022, 16
[6] Emotional Audio-Visual Speech Synthesis Based on PAD
Jia, Jia
Zhang, Shen
Meng, Fanbo
Wang, Yongxin
Cai, Lianhong
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 570 - 582
[7] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
[J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[8] Emotional perception of speech sounds under audio-visual presentation
Shigeno, S
[J]. INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 53 - 53
[9] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
[J]. Alm, M. (magnus.alm@svt.ntnu.no), 1600, Acoustical Society of America (134):
[10] AUDIO-VISUAL RECOGNITION OF OVERLAPPED SPEECH FOR THE LRS2 DATASET
Yu, Jianwei
Zhang, Shi-Xiong
Wu, Jian
Ghorbani, Shahram
Wu, Bo
Kang, Shiyin
Liu, Shansong
Liu, Xunying
Meng, Helen
Yu, Dong
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6984 - 6988

← 1 2 3 4 5 →