A Cantonese Audio-Visual Emotional Speech (CAVES) dataset

被引:1
|
作者
Chong C.S. [1 ]
Davis C. [1 ]
Kim J. [1 ]
机构
[1] The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Locked Bag 1797, Penrith, 2751, NSW
基金
澳大利亚研究理事会;
关键词
Auditory and visual expressions; Cantonese dataset; Dataset evaluation; Emotional speech;
D O I
10.3758/s13428-023-02270-7
中图分类号
学科分类号
摘要
We present a Cantonese emotional speech dataset that is suitable for use in research investigating the auditory and visual expression of emotion in tonal languages. This unique dataset consists of auditory and visual recordings of ten native speakers of Cantonese uttering 50 sentences each in the six basic emotions plus neutral (angry, happy, sad, surprise, fear, and disgust). The visual recordings have a full HD resolution of 1920 × 1080 pixels and were recorded at 50 fps. The important features of the dataset are outlined along with the factors considered when compiling the dataset. A validation study of the recorded emotion expressions was conducted in which 15 native Cantonese perceivers completed a forced-choice emotion identification task. The variability of the speakers and the sentences was examined by testing the degree of concordance between the intended and the perceived emotion. We compared these results with those of other emotion perception and evaluation studies that have tested spoken emotions in languages other than Cantonese. The dataset is freely available for research purposes. © 2023, The Author(s).
引用
收藏
页码:5264 / 5278
页数:14
相关论文
共 50 条
  • [1] Author Correction: A Cantonese Audio-Visual Emotional Speech (CAVES) dataset
    Chee Seng Chong
    Chris Davis
    Jeesun Kim
    [J]. Behavior Research Methods, 2024, 56 (6) : 6410 - 6410
  • [2] A Cantonese Audio-Visual Emotional Speech (CAVES) dataset (Nov, 10.3758/s13428-023-02270-7, 2023)
    Chong, Chee Seng
    Davis, Chris
    Kim, Jeesun
    [J]. BEHAVIOR RESEARCH METHODS, 2024, 56 (06) : 6410 - 6410
  • [3] CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition
    Dai, Wenliang
    Cahyawijaya, Samuel
    Yu, Tiezheng
    Barezi, Elham J.
    Xu, Peng
    Yiu, Cheuk Tung Shadow
    Frieske, Rita
    Lovenia, Holy
    Winata, Genta Indra
    Chen, Qifeng
    Ma, Xiaojuan
    Shi, Bertram E.
    Fung, Pascale
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6786 - 6793
  • [4] EMID: An Emotional Aligned Dataset in Audio-Visual Modality
    Zou, Jialing
    Mei, Jiahao
    Ye, Guangze
    Huai, Tianyu
    Shen, Qiwei
    Dong, Daoguo
    [J]. PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON MULTIMEDIA CONTENT GENERATION AND EVALUATION, MCGE 2023: New Methods and Practice, 2023, : 41 - 48
  • [5] Integrative interaction of emotional speech in audio-visual modality
    Dong, Haibin
    Li, Na
    Fan, Lingzhong
    Wei, Jianguo
    Xu, Junhai
    [J]. FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [6] Emotional Audio-Visual Speech Synthesis Based on PAD
    Jia, Jia
    Zhang, Shen
    Meng, Fanbo
    Wang, Yongxin
    Cai, Lianhong
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 570 - 582
  • [7] An audio-visual distance for audio-visual speech vector quantization
    Girin, L
    Foucher, E
    Feng, G
    [J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
  • [9] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
    [J]. Alm, M. (magnus.alm@svt.ntnu.no), 1600, Acoustical Society of America (134):
  • [10] AUDIO-VISUAL RECOGNITION OF OVERLAPPED SPEECH FOR THE LRS2 DATASET
    Yu, Jianwei
    Zhang, Shi-Xiong
    Wu, Jian
    Ghorbani, Shahram
    Wu, Bo
    Kang, Shiyin
    Liu, Shansong
    Liu, Xunying
    Meng, Helen
    Yu, Dong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6984 - 6988