SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia

被引:0
|
作者
Lyu, Dau-Cheng [1 ,4 ]
Tan, Tien-Ping [2 ]
Chng, Eng-Siong [1 ,4 ]
Li, Haizhou [1 ,3 ,4 ]
机构
[1] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[2] Univ Sains Malaysia, Sch Comp Sci, George Town 11800, Malaysia
[3] Inst Infoconun Res, Singapore 138632, Singapore
[4] Nanyang Technol Univ, Temasek Labs, Singapore 639798, Singapore
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In Singapore and Malaysia, people often speak a mixture of Mandarin and English within a single sentence. We call such sentences intra-sentential code-switch sentences. In this paper, we report on the development of a Mandarin-English code-switching spontaneous speech corpus: SEAME. The corpus is developed as part of a multilingual speech recognition project and will be used to examine how Mandarin-English code-switch speech occurs in the spoken language in South-East Asia. Additionally, it can provide insights into the development of large vocabulary continuous speech recognition (LVCSR) for code-switching speech. The corpus collected consists of intra-sentential code-switching utterances that are recorded under both interview and conversational settings. This paper describes the corpus design and the analysis of collected corpus.
引用
收藏
页码:1986 / +
页数:2
相关论文
共 50 条
  • [1] Mandarin-English code-switching speech corpus in South-East Asia: SEAME
    Lyu, Dau-Cheng
    Tan, Tien-Ping
    Chng, Eng-Siong
    Li, Haizhou
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 581 - 600
  • [2] Mandarin–English code-switching speech corpus in South-East Asia: SEAME
    Dau-Cheng Lyu
    Tien-Ping Tan
    Eng-Siong Chng
    Haizhou Li
    [J]. Language Resources and Evaluation, 2015, 49 : 581 - 600
  • [3] A Review of the Mandarin-English Code-switching Corpus: SEAME
    Lee, Grandee
    Ho, Thi-Nga
    Chng, Eng-Siong
    Li, Haizhou
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 210 - 213
  • [4] A Mandarin-English Code-Switching Corpus
    Li, Ying
    Yu, Yue
    Fung, Pascale
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2515 - 2519
  • [5] Mandarin-English Code-switching Speech Recognition
    Xu, Haihua
    Van Tung Pham
    Kyaw, Zin Tun
    Lim, Zhi Hao
    Chng, Eng Siong
    Li, Haizhou
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 554 - 555
  • [6] Pronunciation augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Wei, Shuang
    Lian, Jie
    Li, Yijie
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [7] Pronunciation augmentation for Mandarin-English code-switching speech recognition
    Yanhua Long
    Shuang Wei
    Jie Lian
    Yijie Li
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [8] TALCS: AN OPEN-SOURCE MANDARIN-ENGLISH CODE-SWITCHING CORPUS AND A SPEECH RECOGNITION BASELINE
    Li, Chengfei
    Deng, Shuhao
    Wang, Yaoping
    Wang, Guangjing
    Gong, Yaguang
    Chen, Changbin
    Bai, Jinfeng
    [J]. INTERSPEECH 2022, 2022, : 1741 - 1745
  • [9] NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Chuang, Shun-Po
    Chang, Heng-Jui
    Huang, Sung-Feng
    Lee, Hung-yi
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 465 - 472
  • [10] Acoustic data augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Li, Yijie
    Zhang, Qiaozheng
    Wei, Shuang
    Ye, Hong
    Yang, Jichen
    [J]. APPLIED ACOUSTICS, 2020, 161