Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation

被引:0
|
作者
Wang, Ke [1 ,2 ]
Chen, Guandan [3 ]
Huang, Zhongqiang [3 ]
Wan, Xiaojun [1 ,2 ]
Huang, Fei [3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[3] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the near-human performances already achieved on formal texts such as news articles, neural machine translation still has difficulty in dealing with "user-generated" texts that have diverse linguistic phenomena but lack large-scale high-quality parallel corpora. To address this problem, we propose a counterfactual domain adaptation method to better leverage both large-scale source-domain data (formal texts) and small-scale target-domain data (informal texts). Specifically, by considering effective counterfactual conditions (the concatenations of source-domain texts and the target-domain tag), we construct the counterfactual representations to fill the sparse latent space of the target domain caused by a small amount of data, that is, bridging the gap between the source-domain data and the target-domain data. Experiments on English-to-Chinese and Chinese-to-English translation tasks show that our method outperforms the base model that is trained only on the informal corpus by a large margin, and consistently surpasses different baseline methods by +1.12 similar to 4.34 BLEU points on different datasets. Furthermore, we also show that our method achieves competitive performances on cross-domain language translation on four language pairs.
引用
收藏
页码:13970 / 13978
页数:9
相关论文
共 50 条
  • [1] Bridging the Domain Gap for Stance Detection for the Zulu Language
    Dlamini, Gcinizwe
    Bekkouch, Imad Eddine Ibrahim
    Khan, Adil
    Derczynski, Leon
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2023, 542 : 312 - 325
  • [2] Evaluation of Domain Adaptation Approaches to Improve the Translation Quality
    Yildirim, Ezgi
    Tantug, Ahmet Cuneyd
    NEW TRENDS IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, 2015, 572 : 15 - 26
  • [3] Bridging domain spaces for unsupervised domain adaptation
    Na, Jaemin
    Jung, Heechul
    Chang, Hyung Jin
    Hwang, Wonjun
    PATTERN RECOGNITION, 2025, 164
  • [4] DoCoGen: Domain Counterfactual Generation for Low Resource Domain Adaptation
    Calderon, Nitay
    Ben-David, Eyal
    Feder, Amir
    Reichart, Roi
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7727 - 7746
  • [5] Bridging the Gap Between Events and Frames Through Unsupervised Domain Adaptation
    Messikommer, Nico
    Gehrig, Daniel
    Gehrig, Mathias
    Scaramuzza, Davide
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 3515 - 3522
  • [6] Bridging the gap: Active learning for efficient domain adaptation in object detection
    Menke, Maximilian
    Wenzel, Thomas
    Schwung, Andreas
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 254
  • [7] StereoGAN: Bridging Synthetic-to-Real Domain Gap by Joint Optimization of Domain Translation and Stereo Matching
    Liu, Rui
    Yang, Chengxi
    Sun, Wenxiu
    Wang, Xiaogang
    Li, Hongsheng
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12754 - 12763
  • [8] Bridging Cross-Tasks Gap for Cognitive Assessment via Fine-Grained Domain Adaptation
    Zhang, Yingwei
    Chen, Yiqiang
    Yu, Hanchao
    Lv, Zeping
    Li, Qing
    Yang, Xiaodong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4330 - 4337
  • [9] FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation
    Na, Jaemin
    Jung, Heechul
    Chang, Hyung Jin
    Hwang, Wonjun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1094 - 1103
  • [10] BRIDGING THE GAP BETWEEN OUTPUTS: DOMAIN ADAPTATION FOR LUNG CANCER IHC SEGMENTATION
    Diao, Li
    Guo, Haoyue
    Zhou, Yue
    He, Yayi
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 6 - 10