Layer-Wise Representation Fusion for Compositional Generalization

被引：0

作者：

Zheng, Yafang ^{[1
,2
]}

Lin, Lei ^{[1
,2
,3
]}

Li, Shuangtao ^{[1
,2
]}

Yuan, Yuxuan ^{[1
,2
]}

Lai, Zhaohong ^{[1
,2
]}

Liu, Shan ^{[1
,2
]}

Fu, Biao ^{[1
,2
]}

Chen, Yidong ^{[1
,2
]}

Shi, Xiaodong ^{[1
,2
]}

机构：

[1] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Xiamen, Peoples R China

[2] Xiamen Univ, Key Lab Digital Protect & Intelligent Proc Intang, Minist Culture & Tourism, Xiamen, Peoples R China

[3] Kuaishou Technol, Beijing, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17 | 2024年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the "shallow" residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel Layer-wise Representation Fusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a fuse -attention module at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal. Codes are available at https://github.com/thinkaboutzero/LRF.

引用

页码：19706 / 19714

页数：9

共 50 条

[31] Mixed layer-wise models for multilayered plates analysis
Carrera, E
[J]. COMPOSITE STRUCTURES, 1998, 43 (01) : 57 - 70
[32] Layer-Wise Weight Decay for Deep Neural Networks
Ishii, Masato
Sato, Atsushi
[J]. IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10749 : 276 - 289
[33] MLP in layer-wise form with applications to weight decay
Kärkkäinen, T
[J]. NEURAL COMPUTATION, 2002, 14 (06) : 1451 - 1480
[34] A layer-wise frequency scaling for a neural processing unit
Chung, Jaehoon
Kim, HyunMi
Shin, Kyoungseon
Lyuh, Chun-Gi
Cho, Yong Cheol Peter
Han, Jinho
Kwon, Youngsu
Gong, Young-Ho
Chung, Sung Woo
[J]. ETRI JOURNAL, 2022, 44 (05) : 849 - 858
[35] Layer-wise domain correction for unsupervised domain adaptation
Shuang Li
Shi-ji Song
Cheng Wu
[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19 : 91 - 103
[36] Adversarial Examples Detection and Analysis with Layer-wise Autoencoders
Wojcik, Bartosz
Morawiecki, Pawel
Smieja, Marek
Krzyzek, Tomasz
Spurek, Przemyslaw
Tabor, Jacek
[J]. 2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1322 - 1326
[37] Layer-wise Model Pruning based on Mutual Information
Fan, Chun
Li, Jiwei
Ao, Xiang
Wu, Fei
Meng, Yuxian
Sun, Xiaofei
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3079 - 3090
[38] A Layer-Wise Ensemble Technique for Binary Neural Network
Xi, Jiazhen
Yamauchi, Hiroyuki
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (08)
[39] Deep Learning Layer-Wise Learning of Feature Hierarchies
Schulz, Hannes
Behnke, Sven
[J]. KUNSTLICHE INTELLIGENZ, 2012, 26 (04): : 357 - 363
[40] Layer-wise Searching for 1-bit Detectors
Xu, Sheng
Zhao, Junhe
Lu, Jinhu
Zhang, Baochang
Han, Shumin
Doermann, David
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5678 - 5687

← 1 2 3 4 5 →