ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation

被引：0

作者：

Lovenia, Holy ^{[1
]}

Cahyawijaya, Samuel ^{[1
]}

Winata, Genta Indra ^{[1
]}

Xu, Peng ^{[1
]}

Yan, Xu ^{[1
]}

Liu, Zihan ^{[1
]}

Frieske, Rita ^{[1
]}

Yu, Tiezheng ^{[1
]}

Dai, Wenliang ^{[1
]}

Barezi, Elham J. ^{[1
]}

Chen, Qifeng ^{[1
]}

Ma, Xiaojuan ^{[1
]}

Shi, Bertram E. ^{[1
]}

Fung, Pascale ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

来源：

LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2022年

关键词：

code-switching; corpus; bilingual; speech; dialogue; Mandarin Chinese; English; low-resource; SPEECH; READ;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Code-switching is a speech phenomenon occurring when a speaker switches language during a conversation. Despite the spontaneous nature of code-switching in conversational spoken language, most existing works collect code-switching data from read speech instead of spontaneous speech. ASCEND (A Spontaneous Chinese-English Dataset) is a high-quality Mandarin Chinese-English code-switching corpus built on spontaneous multi-turn conversational dialogue sources collected in Hong Kong. We report ASCEND's design and procedure for collecting the speech data, including annotations. ASCEND consists of 10.62 hours of clean speech, collected from 23 bilingual speakers of Chinese and English. Furthermore, we conduct baseline experiments using pre-trained wav2vec 2.0 models, achieving a best performance of 22.69% character error rate and 27.05% mixed error rate.

引用

页码：7259 / 7268

页数：10

共 46 条

[21] TweetTaglish: A Dataset for Investigating Tagalog-English Code-Switching
Herrera, Megan
Aich, Ankit
Parde, Natalie
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2090 - 2097
[22] Intra-speaker variation in Chinese-English code-switching: The interaction between cognitive and contextual factors
Liu, Hong
INTERNATIONAL JOURNAL OF BILINGUALISM, 2018, 22 (06) : 740 - 762
[23] Attitude-Behavior Relation and Language Use: Chinese-English Code-Switching and Code-Mixing Among Chinese Undergraduate Students
Moradi, Hamzeh
Chen, Jianbo
SAGE OPEN, 2022, 12 (04):
[24] A Functional study on Chinese-English Code Switching in Network Chat
郑伟
校园英语, 2012, (11) : 200 - 200
[25] Code-Switching by Spanish-English Bilingual Children in a Code-Switching Conversation Sample: Roles of Language Proficiency, Interlocutor Behavior, and Parent-Reported Code-Switching Experience
Gross, Megan C.
Gonzalez, Ada C. Lopez
Girardin, Maria G.
Almeida, Adriana M.
LANGUAGES, 2022, 7 (04)
[26] Syntactic Analysis of Chinese/English Code-switching in Besieged City
Shangrao Normal College
科技信息(学术研究), 2007, (15) : 120 - 121
[27] Chinese-English Phone Set Construction for Code-Switching ASR Using Acoustic and DNN-Extracted Articulatory Features
Wu, Chung-Hsien
Shen, Han-Ping
Yang, Yan-Ting
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) : 858 - 862
[28] Insertional code-switching as interactional resource in Mandarin-English bilingual conversation
Wang, Wei
INTERNATIONAL JOURNAL OF BILINGUALISM, 2024,
[29] Voice onset time in Spanish-English spontaneous code-switching
Piccinini, Page
Arvaniti, Amelia
JOURNAL OF PHONETICS, 2015, 52 : 121 - 137
[30] A Study of the Pragmatic Adaptation of the Teachers' English/Chinese Code-switching in Class
陈超
青春岁月, 2013, (16) : 187 - 187

← 1 2 3 4 5 →