ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

被引:0
|
作者
Lovenia, Holy [1 ]
Cahyawijaya, Samuel [1 ]
Winata, Genta Indra [1 ]
Xu, Peng [1 ]
Yan, Xu [1 ]
Liu, Zihan [1 ]
Frieske, Rita [1 ]
Yu, Tiezheng [1 ]
Dai, Wenliang [1 ]
Barezi, Elham J. [1 ]
Chen, Qifeng [1 ]
Ma, Xiaojuan [1 ]
Shi, Bertram E. [1 ]
Fung, Pascale [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
code-switching; corpus; bilingual; speech; dialogue; Mandarin Chinese; English; low-resource; SPEECH; READ;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND's design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69% character error rate and 27.05% mixed error rate.
引用
收藏
页码:7259 / 7268
页数:10
相关论文
共 46 条
  • [31] CODE-SWITCHING AND CODE-MIXING - THE CASE OF A CHILD LEARNING ENGLISH AND CHINESE SIMULTANEOUSLY
    KWANTERRY, A
    JOURNAL OF MULTILINGUAL AND MULTICULTURAL DEVELOPMENT, 1992, 13 (03) : 243 - 259
  • [32] Spanish-English bilingual voice onset time in spontaneous code-switching
    Balukas, Colleen
    Koops, Christian
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2015, 19 (04) : 423 - 443
  • [33] The effects of executive functions on language control during Chinese-English emotional word code-switching (vol 14,1087513, 2023)
    Zhang, Jiao
    Fan, Lin
    FRONTIERS IN PSYCHOLOGY, 2024, 15
  • [34] Intraphrasal code-switching: Evidence for parallel systems in a child learning Chinese and English
    Du, H
    PROCEEDINGS OF THE 26TH ANNUAL BOSTON UNIVERSITY CONFERENCE ON LANGUAGE DEVELOPMENT, VOLS 1 AND 2, 2002, : 175 - 186
  • [35] CODE-SWITCHING AND REGISTER SHIFT - EVIDENCE FROM FINNISH-ENGLISH CHILD BILINGUAL CONVERSATION
    HALMARI, H
    SMITH, W
    JOURNAL OF PRAGMATICS, 1994, 21 (04) : 427 - 445
  • [36] MatDC: A Multi-turn Multi-domain Annotated Task-oriented Dialogue Dataset in Chinese
    Tseng, Yu-Hsiang
    Hsieh, Shu-Kai
    Lian, Richard
    Chiang, Chiung-Yu
    Chang, Yu-Lin
    Chang, Li-Ping
    Hsieh, Ji-Lung
    2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 165 - 170
  • [37] Dataset from Code-switching between English and Malay Languages in Malaysian Premier Polytechnics ESL Classrooms
    Mokhtar, Mazlin Mohamed
    Muhamad, Maizatulliza
    Bahari, Aireen Aina
    Khaja, Farah Natchiar Mohd.
    DATA IN BRIEF, 2022, 45
  • [38] MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
    Feng, Jiazhan
    Sun, Qingfeng
    Xu, Can
    Zhao, Pu
    Yang, Yaming
    Tao, Chongyang
    Zhao, Dongyan
    Lin, Qingwei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7348 - 7363
  • [39] Analysis on Chinese/English Code-Switching in Computer-Mediated Communication Based on Meme Theory
    张淳
    石娟丽
    海外英语, 2011, (07) : 11 - 14
  • [40] Examining the Functional Category in Chinese–English Code-Switching: Evidence from the Eye-Movements
    Rui Li
    Zhiyi Zhang
    Chuanbin Ni
    Wei Xiao
    Junyan Wei
    Haoyun Dai
    Journal of Psycholinguistic Research, 2018, 47 : 1 - 28