A Review of the Mandarin-English Code-switching Corpus: SEAME

被引:0
|
作者
Lee, Grandee [1 ]
Ho, Thi-Nga [2 ]
Chng, Eng-Siong [2 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Elect & Comp Engn Dept, Singapore 117583, Singapore
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
关键词
Code-switching corpus; Mandarin English corpus; SEAME;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we report the development of the South East Asia Mandarin-English (SEAME) corpus, including 63 hours of transcribed spontaneous Mandarin English code-switching speech in its first release, and an update of additional 129 transcribed hours of speech. The corpus was developed for code-switching speech recognition research, such as LVCSR, language recognition, and language segmentation. It was made publicly available through LDC since 2015. The corpus was recorded under unscripted interview and conversation settings, therefore, consisting of spontaneous speech. This paper seeks to present a comprehensive statistics and analysis of the corpus after the update in term of its composition, speaker profile and code-switch characteristics. This paper will also review its suitability for various code-switch related researches and possible further developments.
引用
收藏
页码:210 / 213
页数:4
相关论文
共 50 条
  • [21] Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition
    Zhang, Shuai
    Yi, Jiangyan
    Tian, Zhengkun
    Tao, Jianhua
    Bai, Ye
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [22] A corpus investigation of the typology of code-switching between closely related languages: Data from Mandarin-Taiwanese code-switching
    Hsiao, Chien-Han
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2024,
  • [23] An Empirical Study on Punctuation Restoration for English, Mandarin, and Code-Switching Speech
    Liu, Changsong
    Thi Nga Ho
    Chng, Eng Siong
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 286 - 296
  • [24] Improving End-to-End Modeling For Mandarin-English Code-Switching Using Lightweight Switch-Routing Mixture-of-Experts
    Tan, Fengyun
    Feng, Chaofeng
    Wei, Tao
    Gong, Shuai
    Leng, Jinqiang
    Chu, Wei
    Ma, Jun
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2023, 2023, : 4224 - 4228
  • [25] Language choice and code-switching in bilingual children's interaction under multilingual contexts: evidence from Mandarin-English bilingual preschoolers
    Zhang, Haijing
    Huang, Fangwei
    Wang, Cong
    INTERNATIONAL JOURNAL OF MULTILINGUALISM, 2024,
  • [26] Code-Switching in Early English
    Honkapohja, Alpo
    Wright, Laura
    JOURNAL OF HISTORICAL PRAGMATICS, 2013, 14 (02) : 321 - 327
  • [27] Lexical tonal effects in code-switching: A comparative study of Cantonese, Mandarin, and Vietnamese switching with English
    Li, Katrina Kechun
    Nguyen, Li
    Bryant, Christopher
    Yoo, Kayeon
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2024, 28 (05) : 799 - 827
  • [28] Code-Switching in Early English
    Skaffari, Janne
    NEUPHILOLOGISCHE MITTEILUNGEN, 2013, 114 (01) : 121 - 124
  • [29] Code-switching Between Mandarin and Hainan Dialect
    洪丽娜
    海外英语, 2014, (14) : 251 - 252
  • [30] MERLIon CCS Challenge: A English-Mandarin code-switching child-directed speech corpus for language identification and diarization
    Chua, Victoria Y. H.
    Liu, Hexin
    Perera, Leibny Paola Garcia
    Woon, Fei Ting
    Wong, Jinyi
    Zhang, Xiangyu
    Khudanpur, Sanjeev
    Khong, Andy W. H.
    Dauwels, Justin
    Styles, Suzy J.
    INTERSPEECH 2023, 2023, : 4109 - 4113