GDOD: Effective Gradient Descent using Orthogonal Decomposition for Multi-Task Learning

被引：1

作者：

Dong, Xin ^{[1
]}

Wu, Ruize ^{[2
]}

Xiong, Chao ^{[1
]}

Li, Hai ^{[1
]}

Cheng, Lei ^{[2
]}

He, Yong ^{[2
]}

Qian, Shiyou ^{[3
]}

Cao, Jian ^{[3
]}

Mo, Linjian ^{[1
]}

机构：

[1] Ant Grp, Shanghai, Peoples R China

[2] Ant Grp, Hangzhou, Peoples R China

[3] Shanghai Jiao Tong Univ, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 | 2022年

关键词：

multi-task learning; orthogonal decomposition; gradient conflict;

D O I：

10.1145/3511808.3557333

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-task learning (MTL) aims at solving multiple related tasks simultaneously and has experienced rapid growth in recent years. However, MTL models often suffer from performance degeneration with negative transfer due to learning several tasks simultaneously. Some related work attributed the source of the problem is the conflicting gradients. In this case, it is needed to select useful gradient updates for all tasks carefully. To this end, we propose a novel optimization approach for MTL, named GDOD, which manipulates gradients of each task using an orthogonal basis decomposed from the span of all task gradients. GDOD decomposes gradients into task-shared and task-conflict components explicitly and adopts a general update rule for avoiding interference across all task gradients. This allows guiding the update directions depending on the task-shared components. Moreover, we prove the convergence of GDOD theoretically under both convex and non-convex assumptions. Experiment results on several multi-task datasets not only demonstrate the significant improvement of GDOD performed to existing MTL models but also prove that our algorithm outperforms state-of-the-art optimization methods in terms of AUC and Logloss metrics.

引用

页码：386 / 395

页数：10

共 50 条

[1] Multi-task gradient descent for multi-task learning
Bai, Lu
Ong, Yew-Soon
He, Tiantian
Gupta, Abhishek
[J]. MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
[2] Multi-task gradient descent for multi-task learning
Lu Bai
Yew-Soon Ong
Tiantian He
Abhishek Gupta
[J]. Memetic Computing, 2020, 12 : 355 - 369
[3] Conflict-Averse Gradient Descent for Multi-task Learning
Liu, Bo
Liu, Xingchao
Jin, Xiaojie
Stone, Peter
Liu, Qiang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Gradient Surgery for Multi-Task Learning
Yu, Tianhe
Kumar, Saurabh
Gupta, Abhishek
Levine, Sergey
Hausman, Karol
Finn, Chelsea
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[5] A Multiple Gradient Descent Design for Multi-Task Learning on Edge Computing: Multi-Objective Machine Learning Approach
Zhou, Xiaojun
Gao, Yuan
Li, Chaojie
Huang, Zhaoke
[J]. IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (01): : 121 - 133
[6] Drivetrain System Identification in a Multi-Task Learning Strategy using Partial Asynchronous Elastic Averaging Stochastic Gradient Descent
Staessens, Tom
Crevecoeur, Guillaume
[J]. 2020 IEEE/ASME INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT MECHATRONICS (AIM), 2020, : 1549 - 1554
[7] Gradient Descent Decomposition for Multi-objective Learning
Costa, Marcelo Azevedo
Braga, Antonio Padua
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2011, 2011, 6936 : 377 - +
[8] Learned Weight Sharing for Deep Multi-Task Learning by Natural Evolution Strategy and Stochastic Gradient Descent
Prellberg, Jonas
Kramer, Oliver
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[9] Online Multi-Task Learning for Policy Gradient Methods
Ammar, Haitham Bou
Eaton, Eric
Ruvolo, Paul
Taylor, Matthew E.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1206 - 1214
[10] Skills Regularized Task Decomposition for Multi-task Offline Reinforcement Learning
Yoo, Minjong
Cho, Sangwoo
Woo, Honguk
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,

← 1 2 3 4 5 →