Bridging the Domain Gap: Improve Informal Language Translation via Counterfactual Domain Adaptation

被引:0
|
作者
Wang, Ke [1 ,2 ]
Chen, Guandan [3 ]
Huang, Zhongqiang [3 ]
Wan, Xiaojun [1 ,2 ]
Huang, Fei [3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[3] Alibaba Grp, DAMO Acad, Hangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the near-human performances already achieved on formal texts such as news articles, neural machine translation still has difficulty in dealing with "user-generated" texts that have diverse linguistic phenomena but lack large-scale high-quality parallel corpora. To address this problem, we propose a counterfactual domain adaptation method to better leverage both large-scale source-domain data (formal texts) and small-scale target-domain data (informal texts). Specifically, by considering effective counterfactual conditions (the concatenations of source-domain texts and the target-domain tag), we construct the counterfactual representations to fill the sparse latent space of the target domain caused by a small amount of data, that is, bridging the gap between the source-domain data and the target-domain data. Experiments on English-to-Chinese and Chinese-to-English translation tasks show that our method outperforms the base model that is trained only on the informal corpus by a large margin, and consistently surpasses different baseline methods by +1.12 similar to 4.34 BLEU points on different datasets. Furthermore, we also show that our method achieves competitive performances on cross-domain language translation on four language pairs.
引用
收藏
页码:13970 / 13978
页数:9
相关论文
共 50 条
  • [41] Domain adaptation of statistical machine translation with domain-focused web crawling
    Pavel Pecina
    Antonio Toral
    Vassilis Papavassiliou
    Prokopis Prokopidis
    Aleš Tamchyna
    Andy Way
    Josef van Genabith
    Language Resources and Evaluation, 2015, 49 : 147 - 193
  • [42] Domain adaptation of statistical machine translation with domain-focused web crawling
    Pecina, Pavel
    Toral, Antonio
    Papavassiliou, Vassilis
    Prokopidis, Prokopis
    Tamchyna, Ales
    Way, Andy
    van Genabith, Josef
    LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (01) : 147 - 193
  • [43] Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation
    Pizzati, Fabio
    de Charette, Raoul
    Zaccaria, Michela
    Cerri, Pietro
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 2979 - 2987
  • [44] Domain-Aware Graph Network for Bridging Multi-Source Domain Adaptation
    Yuan, Jin
    Hou, Feng
    Yang, Ying
    Zhang, Yang
    Shi, Zhongchao
    Geng, Xin
    Fan, Jianping
    He, Zhiqiang
    Rui, Yong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7210 - 7224
  • [45] Domain Adaptation via Prompt Learning
    Ge, Chunjiang
    Huang, Rui
    Xie, Mixue
    Lai, Zihang
    Song, Shiji
    Li, Shuang
    Huang, Gao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1160 - 1170
  • [46] Mind the Gap: Open Set Domain Adaptation via Mutual-to-Separate Framework
    Chang, Dongliang
    Sain, Aneeshan
    Ma, Zhanyu
    Song, Yi-Zhe
    Wang, Ruiping
    Guo, Jun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4159 - 4174
  • [47] Domain Adaptation for Chinese Offensive Language Detection
    Ying, Hao
    Ou, Qiongrong
    Fan, Chengjun
    Mei, Lin
    Zhang, Shuyu
    Xu, Xu
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT IV, NLPCC 2024, 2025, 15362 : 146 - 158
  • [48] A Domain Gap Aware Generative Adversarial Network for Multi-Domain Image Translation
    Xu, Wenju
    Wang, Guanghui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 72 - 84
  • [49] UNSUPERVISED DOMAIN ADAPTATION VIA DOMAIN ADVERSARIAL TRAINING FOR SPEAKER RECOGNITION
    Wang, Qing
    Rao, Wei
    Sun, Sining
    Xie, Lei
    Chng, Eng Siong
    Li, Haizhou
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4889 - 4893
  • [50] Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings
    Dou, Zi-Yi
    Hu, Junjie
    Anastasopoulos, Antonios
    Neubig, Graham
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1417 - 1422