Inherit or discard: learning better domain-specific child networks from the general domain for multi-domain NMT

被引:0
|
作者
Xu, Jinlei [1 ,2 ]
Wen, Yonghua [1 ,2 ]
Xiang, Yan [1 ,2 ]
Jiang, Shuting [1 ,2 ]
Huang, Yuxin [1 ,2 ]
Yu, Zhengtao [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Sch Fac Informat Engn & Automat, Kunming 650500, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Yunnan Key Lab Artificial Intelligence, Kunming 650500, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-domain NMT; Parameter Interference; Parameter Inheritance; Gradient similarity;
D O I
10.1007/s13042-024-02253-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-domain NMT aims to develop a parameter-sharing model for translating general and specific domains, such as biology, legal, etc., which often struggle with the parameter interference problem. Existing approaches typically tackle this issue by learning a domain-specific sub-network for each domain equally, but they ignore the significant data imbalance problem across domains. For instance, the training data for the general domain often outweighs the biological domain tenfold. In this paper, we observe a natural similarity between the general and specific domains, including shared vocabulary or similar sentence structure. We propose a novel parameter inheritance strategy to adaptively learn domain-specific child networks from the general domain. Our approach employs gradient similarity as the criterion for determining which parameters should be inherited or discarded between the general and specific domains. Extensive experiments on several multi-domain NMT corpora demonstrate that our method significantly outperforms several strong baselines. In addition, our method exhibits remarkable generalization performance in adapting to few-shot multi-domain NMT scenarios. Further investigations reveal that our method achieves good interpretability because the parameters learned by the child network from the general domain depend on the interconnectedness between the specific domain and the general domain.
引用
收藏
页码:5439 / 5452
页数:14
相关论文
共 50 条
  • [21] Visual Sequence Learning in Infancy: Domain-General and Domain-Specific Associations With Language
    Shafto, Carissa L.
    Conway, Christopher M.
    Field, Suzanne L.
    Houston, Derek M.
    INFANCY, 2012, 17 (03) : 247 - 271
  • [22] Domain-specific and domain-general metacognition for strategy selection in children with learning disabilities
    Marie Geurten
    Patrick Lemaire
    Current Psychology, 2023, 42 : 14297 - 14305
  • [23] Action observation network: domain-specific or domain-general?
    Wang, Li
    Jiang, Yi
    TRENDS IN COGNITIVE SCIENCES, 2023, 27 (11) : 981 - 982
  • [24] Domain-general and domain-specific processes in cognitive development
    Nunes, T
    HUMAN DEVELOPMENT, 2004, 47 (06) : 370 - 375
  • [25] Cognitive Heterogeneous Multi-Domain Networks with Hierarchical Learning
    Ben Yoo, S. J.
    2018 IEEE PHOTONICS SOCIETY SUMMER TOPICAL MEETING SERIES (SUM), 2018,
  • [26] Learning a Pricing Strategy in Multi-Domain DWDM Networks
    Gurzi, Pasquale
    Steenhaut, Kris
    Nowe, Ann
    Vrancx, Peter
    2011 18TH IEEE WORKSHOP ON LOCAL AND METROPOLITAN AREA NETWORKS (LANMAN), 2011,
  • [27] Person Foreground Segmentation by Learning Multi-Domain Networks
    Liang, Zhiyuan
    Guo, Kan
    Li, Xiaobo
    Jin, Xiaogang
    Shen, Jianbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 585 - 597
  • [28] Marlon - A Domain-Specific Language for Multi-Agent Reinforcement Learning on Networks
    Molderez, Tim
    Oeyen, Bjarno
    De Roover, Coen
    De Meuter, Wolfgang
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1322 - 1329
  • [29] Multi-Task and Multi-Domain Learning with Tensor Networks
    Garg, Yash
    Prater-Bennette, Ashley
    Asif, M. Salman
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXII, 2023, 12547
  • [30] Domain specific incorporation of tryptophan analogues into multi-domain proteins
    Muralidharan, V
    Cho, J
    Trester-Zedlitz, M
    Kowalik, L
    Chait, BT
    Raleigh, DP
    Muir, TW
    PROTEIN SCIENCE, 2004, 13 : 159 - 159