A Generative Data Augmentation Model for Enhancing Chinese Dialect Pronunciation Prediction

被引:3
|
作者
Lin, Chu-Cheng [1 ]
Tsai, Richard Tzong-Han [2 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 10617, Taiwan
[2] Yuan Ze Univ, Dept Comp Sci & Engn, Zhongli 320, Taiwan
关键词
Chinese dialects; data augmentation; generative model; pronunciation database;
D O I
10.1109/TASL.2011.2172424
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most spoken Chinese dialects lack comprehensive digital pronunciation databases, which are crucial for speech processing tasks. Given complete pronunciation databases for related dialects, one can use supervised learning techniques to predict a Chinese character's pronunciation in a target dialect based on the character's features and its pronunciation in other related dialects. Unfortunately, Chinese dialect pronunciation databases are far from complete. We propose a novel generative model that makes use of both existing dialect pronunciation data plus medieval rime books to discover patterns that exist in multiple dialects. The proposed model can augment missing dialectal pronunciations based on existing dialect pronunciation tables (even if incomplete) and the pronunciation data in rime books. The augmented pronunciation database can then be used in supervised learning settings. We evaluate the prediction accuracy in terms of phonological features, such as tone, initial phoneme, final phoneme, etc. For each character, features are evaluated on the whole, overall pronunciation feature accuracy (OPFA). Our first experimental results show that adding features from dialectal pronunciation data to our baseline rime-book model dramatically improves OPFA using the support vector machine (SVM) model. In the second experiment, we compare the performance of the SVM model using phonological features from closely related dialects with that of the model using phonological features from non-closely related dialects. The experimental results show that using features from closely related dialects results in higher accuracy. In the third experiment, we show that using our proposed data augmentation model to fill in missing data can increase the SVM model's OPFA by up to 7.6%.
引用
收藏
页码:1109 / 1117
页数:9
相关论文
共 50 条
  • [11] Enhancing link prediction in graph data augmentation through graphon mixup
    Tangina Sultana
    Md. Delowar Hossain
    Md. Golam Morshed
    Young-Koo Lee
    Neural Computing and Applications, 2025, 37 (8) : 6267 - 6282
  • [12] A Spatial-Temporal Graph Model for Pronunciation Feature Prediction of Chinese Poetry
    Wang, Qing
    Liu, Weiping
    Wang, Xiumei
    Chen, Xinghong
    Chen, Guannan
    Wu, Qingxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10294 - 10308
  • [13] Generative Data Augmentation for Commonsense Reasoning
    Yang, Yiben
    Malaviya, Chaitanya
    Fernandez, Jared
    Swayamdipta, Swabha
    Le Bras, Ronan
    Wang, Ji-Ping
    Bhagavatula, Chandra
    Choi, Yejin
    Downe, Doug
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1008 - 1025
  • [14] A comprehensive survey for generative data augmentation
    Chen, Yunhao
    Yan, Zihui
    Zhu, Yunjie
    NEUROCOMPUTING, 2024, 600
  • [15] Toward Understanding Generative Data Augmentation
    Zheng, Chenyu
    Wu, Guoqiang
    Li, Chongxuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [16] Generative Data Augmentation of Human Biomechanics
    Karason, Halldor
    Ritrovato, Pierluigi
    Maffulli, Nicola
    Tortorella, Francesco
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2023 WORKSHOPS, PT I, 2024, 14365 : 482 - 493
  • [17] Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data
    Mimura, Yuta
    ITE TRANSACTIONS ON MEDIA TECHNOLOGY AND APPLICATIONS, 2025, 13 (01): : 126 - 135
  • [18] Generative Adversarial Network-Based Data Augmentation for Enhancing Wireless Physical Layer Authentication
    Alhoraibi, Lamia
    Alghazzawi, Daniyal
    Alhebshi, Reemah
    SENSORS, 2024, 24 (02)
  • [19] Active Appearance Model Induced Generative Adversarial Network for Controlled Data Augmentation
    Liu, Jianfei
    Shen, Christine
    Liu, Tao
    Aguilera, Nancy
    Tam, Johnny
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT I, 2019, 11764 : 201 - 208
  • [20] Adapting Pre-trained Generative Model to Medical Image for Data Augmentation
    Yuan, Zhouhang
    Fang, Zhengqing
    Huang, Zhengxing
    Wu, Fei
    Yao, Yu-Feng
    Li, Yingming
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 79 - 89