MYCanCor: A Video Corpus of spoken Malaysian Cantonese

被引:0
|
作者
Liesenfeld, Andreas [1 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
关键词
Malaysian Cantonese; spoken corpora; naturally-occurring talk-in-interaction;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Malaysia Cantonese Corpus (MYCanCor) is a collection of recordings of Malaysian Cantonese speech mainly collected in Perak, Malaysia. The corpus consists of around 20 hours of video recordings of spontaneous talk-in-interaction (56 settings) typically involving 2-4 speakers. A short scene description as well as basic speaker information is provided for each recording. The corpus is transcribed in CHAT (minCHAT) format and presented in traditional Chinese characters (UTF8) using the Hong Kong Supplementary Character Set (HKSCS). MYCanCor is expected to be a useful resource for researchers interested in any aspect of spoken language processing or Chinese multimodal corpora.
引用
收藏
页码:764 / 767
页数:4
相关论文
共 50 条
  • [11] A Quantitative Study of Right Dislocation in Cantonese Spoken Discourse
    Lai, Christy Choi-Ting
    Law, Sam-Po
    Kong, Anthony Pak-Hin
    LANGUAGE AND SPEECH, 2017, 60 (04) : 633 - 642
  • [12] CantoMap: a Hong Kong Cantonese MapTask Corpus
    Winterstein, Gregoire
    Tang, Carmen
    Lai, Regine
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2906 - 2913
  • [13] Spock - a Spoken Corpus Client
    Janssen, Maarten
    Freitas, Tiago
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3473 - 3478
  • [14] Where are the corpus on spoken French?
    Cappeau, Paul
    Gadet, Francoise
    REVUE FRANCAISE DE LINGUISTIQUE APPLIQUEE, 2007, 12 (01): : 129 - 133
  • [15] The AUTONOMATA Spoken Names Corpus
    van den Heuvel, Henk
    Martens, Jean-Pierre
    D'hoore, Bart
    D'hanens, Kristof
    Konings, Nanneke
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 140 - 143
  • [16] The design of the Spoken Dutch Corpus
    Oostdijk, N
    NEW FRONTIERS OF CORPUS RESEARCH, 2002, (36): : 105 - 112
  • [17] The corpus of Spanish spoken in Tunja
    Calderon Noguera, Donald Freddy
    CUADERNOS DE LINGUISTICA HISPANICA, 2008, 12 : 17 - 30
  • [18] The ManDi Corpus: A Spoken Corpus of Mandarin Regional Dialects
    Zhao, Liang
    Chodroff, Eleanor
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1985 - 1990
  • [19] Probabilistic Phonotactics as a Cue for Recognizing Spoken Cantonese Words in Speech
    Yip, Michael C. W.
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 2017, 46 (01) : 201 - 210
  • [20] What are effective phonological units in Cantonese spoken word planning?
    Wong, Andus Wing-Kuen
    Chen, Hsuan-Chih
    PSYCHONOMIC BULLETIN & REVIEW, 2009, 16 (05) : 888 - 892