A Generative Data Augmentation Model for Enhancing Chinese Dialect Pronunciation Prediction

被引:3
|
作者
Lin, Chu-Cheng [1 ]
Tsai, Richard Tzong-Han [2 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 10617, Taiwan
[2] Yuan Ze Univ, Dept Comp Sci & Engn, Zhongli 320, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 04期
关键词
Chinese dialects; data augmentation; generative model; pronunciation database;
D O I
10.1109/TASL.2011.2172424
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most spoken Chinese dialects lack comprehensive digital pronunciation databases, which are crucial for speech processing tasks. Given complete pronunciation databases for related dialects, one can use supervised learning techniques to predict a Chinese character's pronunciation in a target dialect based on the character's features and its pronunciation in other related dialects. Unfortunately, Chinese dialect pronunciation databases are far from complete. We propose a novel generative model that makes use of both existing dialect pronunciation data plus medieval rime books to discover patterns that exist in multiple dialects. The proposed model can augment missing dialectal pronunciations based on existing dialect pronunciation tables (even if incomplete) and the pronunciation data in rime books. The augmented pronunciation database can then be used in supervised learning settings. We evaluate the prediction accuracy in terms of phonological features, such as tone, initial phoneme, final phoneme, etc. For each character, features are evaluated on the whole, overall pronunciation feature accuracy (OPFA). Our first experimental results show that adding features from dialectal pronunciation data to our baseline rime-book model dramatically improves OPFA using the support vector machine (SVM) model. In the second experiment, we compare the performance of the SVM model using phonological features from closely related dialects with that of the model using phonological features from non-closely related dialects. The experimental results show that using features from closely related dialects results in higher accuracy. In the third experiment, we show that using our proposed data augmentation model to fill in missing data can increase the SVM model's OPFA by up to 7.6%.
引用
收藏
页码:1109 / 1117
页数:9
相关论文
共 50 条
  • [41] Generative Adversarial Network-Based Data Augmentation Method for Anti-coronavirus Peptides Prediction
    Xu, Jiliang
    Xu, Chungui
    Cao, Ruifen
    He, Yonghui
    Bin, Yannan
    Zheng, Chun-Hou
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT III, 2023, 14088 : 67 - 76
  • [42] A new cyclical generative adversarial network based data augmentation method for multiaxial fatigue life prediction
    Sun, Xingyue
    Zhou, Kun
    Shi, Shouwen
    Song, Kai
    Chen, Xu
    INTERNATIONAL JOURNAL OF FATIGUE, 2022, 162
  • [43] Multimodal Person Verification With Generative Thermal Data Augmentation
    Abdrakhmanova, Madina
    Unaspekov, Timur
    Varol, Huseyin Atakan
    IEEE TRANSACTIONS ON BIOMETRICS, BEHAVIOR, AND IDENTITY SCIENCE, 2024, 6 (01): : 43 - 53
  • [44] Data augmentation and generative machine learning on the cloud platform
    Piyush Vyas
    Kaushik Muthusamy Ragothaman
    Akhilesh Chauhan
    Bhaskar Rimal
    International Journal of Information Technology, 2024, 16 (8) : 4833 - 4843
  • [45] Conditional Generative Data Augmentation for Clinical Audio Datasets
    Seibold, Matthias
    Hoch, Armando
    Farshad, Mazda
    Navab, Nassir
    Fuernstahl, Philipp
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 345 - 354
  • [46] Data augmentation for intelligent manufacturing with generative adversarial framework
    Wang, Yanxia
    Li, Kang
    Gan, Shaojun
    Cameron, Che
    Zheng, Min
    2019 1ST INTERNATIONAL CONFERENCE ON INDUSTRIAL ARTIFICIAL INTELLIGENCE (IAI 2019), 2019,
  • [47] Meta generative image and text data augmentation optimization
    Zhang, Enzhi
    Dong, Bochen
    Wahib, Mohamed
    Zhong, Rui
    Munetomo, Masaharu
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (09): : 12644 - 12662
  • [48] Graph contrastive learning for recommendation with generative data augmentation
    Li, Xiaoge
    Wang, Yin
    Wang, Yihan
    An, Xiaochun
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [49] Biosignal Data Augmentation Based on Generative Adversarial Networks
    Harada, Shota
    Hayashi, Hideaki
    Uchida, Seiichi
    2018 40TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2018, : 368 - 371
  • [50] scNODE: generative model for temporal single cell transcriptomic data prediction
    Zhang, Jiaqi
    Larschan, Erica
    Bigness, Jeremy
    Singh, Ritambhara
    BIOINFORMATICS, 2024, 40 : ii146 - ii154