A CORPUS FOR THE STUDY ON THE ASSESSMENT OF MANDARIN PRONUNCIATION OF TIBETAN SPEAKERS

被引:0
|
作者
Gan, Z. [1 ,3 ]
Jiang, J. [1 ]
Yan, Y. [1 ]
Yang, H. [1 ,2 ,4 ]
机构
[1] Northwest Normal Univ, Coll Phys & Elect Engn, Lanzhou, Gansu, Peoples R China
[2] Northwest Normal Univ, Sch Educ Technol, Lanzhou, Gansu, Peoples R China
[3] Engn Res Ctr Gansu Prov Intelligent Informat Tech, Lanzhou, Gansu, Peoples R China
[4] Natl & Prov Joint Engn Lab Learning Anal Technol, Lanzhou, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
Tibetan speaker Mandarin; pronunciation assessment; audio recording dataset; SAMPA-TSC;
D O I
暂无
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Tibetan speakers always have some types of fixed pronunciation errors when they speak Mandarin, which are affected by their native language pronunciation habits. Therefore, a system assessment that can detect the mispronunciation and overall similarity measurement of syllables or phonemes in Tibetan Mandarin to help learners improve their Mandarin level needs to be studied. A unique corpus is required in order to study on the assessment of Mandarin pronunciation of Tibetan speakers. Unfortunately, there is no such a corpus in this field for the research task. We create a particular corpus by integrating the linguistic theory of Tibetan and Chinese with speech signal processing and machine learning. In this work, we record the non-standard Mandarin audio of Tibetan students and the standard Mandarin audio. These audio recordings share the same text designed by analyzing and comparing the pronunciation characteristics of Tibetan and Chinese. Audio recordings total 5.5 hours that contain 1000 paragraphs, covering 377 syllables without tones and all phonemes in standard Chinese. Then we introduce the recording environment and recording equipments. Furthermore, we set the rules for the annotation of the audio recordings in hierarchical format through PRAAT software: the first layer is the phrase layer, marked with Chinese characters; the second layer is the syllable layer, marked with pinyin; the third layer is the phoneme layer, labeled with Speech Assessment Methods Phonetic Alphabet-Tibetan Standard Chinese (SAMPA-TSC), which is designed by ourselves. Finally, we evaluate the corpus creation in four aspects--coverage, completeness, quality, reusability--and describe the potential of the dataset application.
引用
收藏
页码:7840 / 7848
页数:9
相关论文
共 50 条
  • [21] Designing and implementing a corpus-based online pronunciation learning platform for Cantonese learners of Mandarin
    Chen, Hsueh Chu
    Han, Qian Wen
    [J]. INTERACTIVE LEARNING ENVIRONMENTS, 2020, 28 (01) : 18 - 31
  • [22] Automatic Tone Assessment of Non-Native Mandarin Speakers
    Cheng, Jian
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1298 - 1301
  • [23] Towards Realizing Mandarin-Tibetan Bi-lingual Emotional Speech Synthesis with Mandarin Emotional Training Corpus
    Wu, Peiwen
    Yang, Hongwu
    Gan, Zhenye
    [J]. DATA SCIENCE, PT II, 2017, 728 : 126 - 137
  • [24] PRONUNCIATION ERROR DETECTION FOR COMPUTER ASSISTED PRONUNCIATION TEACHING IN MANDARIN
    Liang, Min-Siong
    Hung, Jian-Yung
    Lyu, Ren-Yuan
    Chiang, Yuang-Chin
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 346 - 349
  • [25] A corpus based analysis of lexical richness of Beijing Mandarin speakers: variable identification and model construction
    Zhang, Yanhui
    [J]. LANGUAGE SCIENCES, 2014, 44 : 60 - 69
  • [26] A Corpus-Based Study of Counterfactuals in Mandarin
    Yong, Qian
    [J]. LANGUAGE AND LINGUISTICS, 2016, 17 (06): : 891 - 915
  • [27] A CORPUS STUDY OF LEXICAL SPEECH ERRORS IN MANDARIN
    Wan, I-Ping
    Allassonniere-Tang, Marc
    [J]. TAIWAN JOURNAL OF LINGUISTICS, 2021, 19 (02) : 85 - 118
  • [28] A Corpus-Based Study of English Pronunciation Variations
    Kim, Sunhee
    Lee, Kyuwhan
    Chung, Minhwa
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1904 - +
  • [29] A new formant feature and its application in Mandarin vowel pronunciation quality assessment
    Lu Xiao-chun
    Pan Fu-ping
    Yin Jun-xun
    Hu Wei-ping
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2013, 20 (12) : 3573 - 3581
  • [30] New machine scores and their combinations for automatic mandarin phonetic pronunciation quality assessment
    Pan, Fuping
    Zhao, Qingwei
    Yan, Yonghong
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS: KES 2007 - WIRN 2007, PT I, PROCEEDINGS, 2007, 4692 : 821 - +