Layer-Wise Representation Fusion for Compositional Generalization

被引:0
|
作者
Zheng, Yafang [1 ,2 ]
Lin, Lei [1 ,2 ,3 ]
Li, Shuangtao [1 ,2 ]
Yuan, Yuxuan [1 ,2 ]
Lai, Zhaohong [1 ,2 ]
Liu, Shan [1 ,2 ]
Fu, Biao [1 ,2 ]
Chen, Yidong [1 ,2 ]
Shi, Xiaodong [1 ,2 ]
机构
[1] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Xiamen, Peoples R China
[2] Xiamen Univ, Key Lab Digital Protect & Intelligent Proc Intang, Minist Culture & Tourism, Xiamen, Peoples R China
[3] Kuaishou Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the "shallow" residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel Layer-wise Representation Fusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a fuse -attention module at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal. Codes are available at https://github.com/thinkaboutzero/LRF.
引用
收藏
页码:19706 / 19714
页数:9
相关论文
共 50 条
  • [31] Mixed layer-wise models for multilayered plates analysis
    Carrera, E
    [J]. COMPOSITE STRUCTURES, 1998, 43 (01) : 57 - 70
  • [32] Layer-Wise Weight Decay for Deep Neural Networks
    Ishii, Masato
    Sato, Atsushi
    [J]. IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10749 : 276 - 289
  • [33] MLP in layer-wise form with applications to weight decay
    Kärkkäinen, T
    [J]. NEURAL COMPUTATION, 2002, 14 (06) : 1451 - 1480
  • [34] A layer-wise frequency scaling for a neural processing unit
    Chung, Jaehoon
    Kim, HyunMi
    Shin, Kyoungseon
    Lyuh, Chun-Gi
    Cho, Yong Cheol Peter
    Han, Jinho
    Kwon, Youngsu
    Gong, Young-Ho
    Chung, Sung Woo
    [J]. ETRI JOURNAL, 2022, 44 (05) : 849 - 858
  • [35] Layer-wise domain correction for unsupervised domain adaptation
    Shuang Li
    Shi-ji Song
    Cheng Wu
    [J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19 : 91 - 103
  • [36] Adversarial Examples Detection and Analysis with Layer-wise Autoencoders
    Wojcik, Bartosz
    Morawiecki, Pawel
    Smieja, Marek
    Krzyzek, Tomasz
    Spurek, Przemyslaw
    Tabor, Jacek
    [J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1322 - 1326
  • [37] Layer-wise Model Pruning based on Mutual Information
    Fan, Chun
    Li, Jiwei
    Ao, Xiang
    Wu, Fei
    Meng, Yuxian
    Sun, Xiaofei
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3079 - 3090
  • [38] A Layer-Wise Ensemble Technique for Binary Neural Network
    Xi, Jiazhen
    Yamauchi, Hiroyuki
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (08)
  • [39] Deep Learning Layer-Wise Learning of Feature Hierarchies
    Schulz, Hannes
    Behnke, Sven
    [J]. KUNSTLICHE INTELLIGENZ, 2012, 26 (04): : 357 - 363
  • [40] Layer-wise Searching for 1-bit Detectors
    Xu, Sheng
    Zhao, Junhe
    Lu, Jinhu
    Zhang, Baochang
    Han, Shumin
    Doermann, David
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5678 - 5687