A Generative Data Augmentation Model for Enhancing Chinese Dialect Pronunciation Prediction

被引:3
|
作者
Lin, Chu-Cheng [1 ]
Tsai, Richard Tzong-Han [2 ]
机构
[1] Natl Taiwan Univ, Dept Comp Sci & Informat Engn, Taipei 10617, Taiwan
[2] Yuan Ze Univ, Dept Comp Sci & Engn, Zhongli 320, Taiwan
关键词
Chinese dialects; data augmentation; generative model; pronunciation database;
D O I
10.1109/TASL.2011.2172424
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most spoken Chinese dialects lack comprehensive digital pronunciation databases, which are crucial for speech processing tasks. Given complete pronunciation databases for related dialects, one can use supervised learning techniques to predict a Chinese character's pronunciation in a target dialect based on the character's features and its pronunciation in other related dialects. Unfortunately, Chinese dialect pronunciation databases are far from complete. We propose a novel generative model that makes use of both existing dialect pronunciation data plus medieval rime books to discover patterns that exist in multiple dialects. The proposed model can augment missing dialectal pronunciations based on existing dialect pronunciation tables (even if incomplete) and the pronunciation data in rime books. The augmented pronunciation database can then be used in supervised learning settings. We evaluate the prediction accuracy in terms of phonological features, such as tone, initial phoneme, final phoneme, etc. For each character, features are evaluated on the whole, overall pronunciation feature accuracy (OPFA). Our first experimental results show that adding features from dialectal pronunciation data to our baseline rime-book model dramatically improves OPFA using the support vector machine (SVM) model. In the second experiment, we compare the performance of the SVM model using phonological features from closely related dialects with that of the model using phonological features from non-closely related dialects. The experimental results show that using features from closely related dialects results in higher accuracy. In the third experiment, we show that using our proposed data augmentation model to fill in missing data can increase the SVM model's OPFA by up to 7.6%.
引用
收藏
页码:1109 / 1117
页数:9
相关论文
共 50 条
  • [21] Spectral Data Augmentation Using Deep Generative Model for Remote Chemical Sensing
    Son, Jungjae
    Byun, Hyung Joon
    Park, Munyeol
    Ha, Jeongjae
    Nam, Hyunwoo
    IEEE ACCESS, 2024, 12 : 98326 - 98337
  • [22] Modulation classification with data augmentation based on a semi-supervised generative model
    Yin, Liyan
    Xiang, Xin
    Liang, Yuan
    Liu, Kun
    WIRELESS NETWORKS, 2024, 30 (06) : 5683 - 5696
  • [23] A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation
    Yang, Qikai
    Li, Panfeng
    Xu, Xinhe
    Ding, Zhicheng
    Zhou, Wenjing
    Nian, Yi
    2024 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND INTELLIGENT SYSTEMS ENGINEERING, MLISE 2024, 2024, : 214 - 218
  • [24] SynCellFactory: Generative Data Augmentation for Cell Tracking
    Sturm, Moritz
    Cerrone, Lorenzo
    Hamprecht, Fred A.
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 304 - 313
  • [25] Generative Data Augmentation for Diabetic Retinopathy Classification
    Lim, Gilbert
    Thombre, Pranav
    Lee, Mong Li
    Hsu, Wynne
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1096 - 1103
  • [26] Generative Adversarial Network (GAN) Based Data Augmentation for Enhancing DL Models on Facade Defect Identification
    Kiper, Beyza
    Gokhale, Savani
    Ergan, Semiha
    COMPUTING IN CIVIL ENGINEERING 2023-DATA, SENSING, AND ANALYTICS, 2024, : 202 - 209
  • [27] Data Augmentation with Improved Generative Adversarial Networks
    Shi, Hongjiang
    Wang, Lu
    Ding, Guangtai
    Yang, Fenglei
    Li, Xiaoqiang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 73 - 78
  • [28] Data Augmentation Powered by Generative Adversarial Networks
    Poka, Karoly Bence
    Szemenyei, Marton
    2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [29] Generative Adversarial Network for Data Augmentation and Substitution
    Stankovic, Marko
    Bacanin, Nebojsa
    Zivkovic, Miodrag
    Jovanovic, Luka
    Sarac, Marko
    Antonijevic, Milos
    2024 ZOOMING INNOVATION IN CONSUMER TECHNOLOGIES CONFERENCE, ZINC 2024, 2024, : 7 - 12
  • [30] Generative Data Augmentation applied to Face Recognition
    Jabberi, Marwa
    Wali, Ali
    Alimi, Adel M.
    2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 242 - 247