New Datasets and Controllable Iterative Data Augmentation Method for Code-switching ASR Error Correction

被引:0
|
作者
Wan, Zhaohong [1 ,2 ]
Wan, Xiaojun [1 ,2 ]
Peng, Wei [3 ]
Li, Rongjun [3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[3] Huawei Technol, Artificial Intelligence Applicat Res Ctr, Shenzhen, Peoples R China
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年
基金
国家重点研发计划; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the wide use of automatic speech recognition(ASR) systems, researchers pay more attention to the ASR error correction task to improve the quality of recognition results. In particular, ASR in bilingual or multilingual settings, namely code-switching ASR, has greater challenges and research value. In this paper, we first present code-switching ASR correction datasets obtained from solid ASR systems and automatic annotators. The datasets contain Chinese-English code-switching dialogues of bilingual speakers in Singapore, Malaysia, and Hong Kong. Based on this task, we propose a controllable iterative (CI) data augmentation method for improving the performance of mainstream ASR error correction systems. With a small amount of training data, our proposed method has the ability to iteratively produce abundant pseudo parallel data from the monolingual corpus for Chinese-English code-switching ASR correction. Results of experiments show that our method achieves the best performance compared with the rulebased, back-translation-based data augmentation methods and large language model ChatGPT.
引用
收藏
页码:8075 / 8087
页数:13
相关论文
共 25 条
  • [21] A new evaluation method: evaluation data and metrics for Chinese grammatical error correction
    Lin, Nankai
    Fu, Yingwen
    Lin, Xiaotian
    Yang, Ziyu
    Jiang, Shengyi
    LANGUAGE RESOURCES AND EVALUATION, 2025,
  • [22] A New Data Augmentation Method for Time Series Wearable Sensor Data Using a Learning Mode Switching-Based DCGAN
    Jeon, Haneul
    Lee, Donghun
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04) : 8671 - 8677
  • [23] New method of load data error-correction and smoothing based on wavelet singularity detection
    Gao, Shan
    Shan, Yuan-Da
    2001, Chinese Society for Electrical Engineering (21):
  • [24] An Encoding Table Corresponding to ASCII Codes for DNA Data Storage and a New Error Correction Method HMSA
    Zhang, Xuncai
    Zhou, Fuzhen
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2024, 23 (02) : 344 - 354
  • [25] Enhancing accuracy in point-interval load forecasting: A new strategy based on data augmentation, customized deep learning, and weighted linear error correction
    Liu, Weican
    Tian, Zhirui
    Qiu, Yuyan
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272