A Review of the Mandarin-English Code-switching Corpus: SEAME

被引:0
|
作者
Lee, Grandee [1 ]
Ho, Thi-Nga [2 ]
Chng, Eng-Siong [2 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Elect & Comp Engn Dept, Singapore 117583, Singapore
[2] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore 639798, Singapore
关键词
Code-switching corpus; Mandarin English corpus; SEAME;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we report the development of the South East Asia Mandarin-English (SEAME) corpus, including 63 hours of transcribed spontaneous Mandarin English code-switching speech in its first release, and an update of additional 129 transcribed hours of speech. The corpus was developed for code-switching speech recognition research, such as LVCSR, language recognition, and language segmentation. It was made publicly available through LDC since 2015. The corpus was recorded under unscripted interview and conversation settings, therefore, consisting of spontaneous speech. This paper seeks to present a comprehensive statistics and analysis of the corpus after the update in term of its composition, speaker profile and code-switch characteristics. This paper will also review its suitability for various code-switch related researches and possible further developments.
引用
收藏
页码:210 / 213
页数:4
相关论文
共 50 条
  • [1] A Mandarin-English Code-Switching Corpus
    Li, Ying
    Yu, Yue
    Fung, Pascale
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2515 - 2519
  • [2] Mandarin-English code-switching speech corpus in South-East Asia: SEAME
    Lyu, Dau-Cheng
    Tan, Tien-Ping
    Chng, Eng-Siong
    Li, Haizhou
    LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 581 - 600
  • [3] SEAME: a Mandarin-English Code-switching Speech Corpus in South-East Asia
    Lyu, Dau-Cheng
    Tan, Tien-Ping
    Chng, Eng-Siong
    Li, Haizhou
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1986 - +
  • [4] Mandarin–English code-switching speech corpus in South-East Asia: SEAME
    Dau-Cheng Lyu
    Tien-Ping Tan
    Eng-Siong Chng
    Haizhou Li
    Language Resources and Evaluation, 2015, 49 : 581 - 600
  • [5] Mandarin-English Code-switching Speech Recognition
    Xu, Haihua
    Van Tung Pham
    Kyaw, Zin Tun
    Lim, Zhi Hao
    Chng, Eng Siong
    Li, Haizhou
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 554 - 555
  • [6] Pronunciation augmentation for Mandarin-English code-switching speech recognition
    Long, Yanhua
    Wei, Shuang
    Lian, Jie
    Li, Yijie
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [7] Pronunciation augmentation for Mandarin-English code-switching speech recognition
    Yanhua Long
    Shuang Wei
    Jie Lian
    Yijie Li
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [8] TALCS: AN OPEN-SOURCE MANDARIN-ENGLISH CODE-SWITCHING CORPUS AND A SPEECH RECOGNITION BASELINE
    Li, Chengfei
    Deng, Shuhao
    Wang, Yaoping
    Wang, Guangjing
    Gong, Yaguang
    Chen, Changbin
    Bai, Jinfeng
    INTERSPEECH 2022, 2022, : 1741 - 1745
  • [9] Insertional code-switching as interactional resource in Mandarin-English bilingual conversation
    Wang, Wei
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2024,
  • [10] NON-AUTOREGRESSIVE MANDARIN-ENGLISH CODE-SWITCHING SPEECH RECOGNITION
    Chuang, Shun-Po
    Chang, Heng-Jui
    Huang, Sung-Feng
    Lee, Hung-yi
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 465 - 472