Vision Transformer Adapters for Generalizable Multitask Learning

被引:2
|
作者
Bhattacharjee, Deblina [1 ]
Susstrunk, Sabine [1 ]
Salzmann, Mathieu [1 ]
机构
[1] Ecole Polytech Fed Lausanne, Sch Comp & Commun Sci, Lausanne, Switzerland
基金
瑞士国家科学基金会;
关键词
D O I
10.1109/ICCV51070.2023.01743
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the- shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added. We introduce a task-adapted attention mechanism within our adapter framework that combines gradient-based task similarities with attention-based ones. The learned task affinities generalize to the following settings: zero-shot task transfer, unsupervised domain adaptation, and generalization without fine-tuning to novel domains. We demonstrate that our approach outperforms not only the existing convolutional neural network- based multitasking methods but also the vision transformer-based ones. Our project page is at https://ivrl.github.io/VTAGML.
引用
收藏
页码:18969 / 18980
页数:12
相关论文
共 50 条
  • [1] End-to-End Multitask Learning With Vision Transformer
    Tian, Yingjie
    Bai, Kunlong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) : 9579 - 9590
  • [2] A Multitask Learning-Based Vision Transformer for Plant Disease Localization and Classification
    Hemalatha, S.
    Jayachandran, Jai Jaganath Babu
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [3] Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search
    Cheng, Zhi-Qi
    Wu, Xiao
    Huang, Siyu
    Li, Jun-Xiu
    Hauptmann, Alexander G.
    Peng, Qiang
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 90 - 98
  • [4] Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer
    Han, Yunhai
    Yu, Kelin
    Batra, Rahul
    Boyd, Nathan
    Mehta, Chaitanya
    Zhao, Tuo
    She, Yu
    Hutchinson, Seth
    Zhao, Ye
    [J]. IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2024,
  • [5] UniT: Multimodal Multitask Learning with a Unified Transformer
    Hu, Ronghang
    Singh, Amanpreet
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1419 - 1429
  • [6] PVSPE: A pyramid vision multitask transformer network for spacecraft pose estimation
    Yang, Hong
    Xiao, Xueming
    Yao, Meibao
    Xiong, Yonggang
    Cui, Hutao
    Fu, Yuegang
    [J]. ADVANCES IN SPACE RESEARCH, 2024, 74 (03) : 1327 - 1342
  • [7] MulT: An End-to-End Multitask Learning Transformer
    Bhattacharjee, Deblina
    Zhang, Tong
    Suesstrunk, Sabine
    Salzmann, Mathieu
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 12021 - 12031
  • [8] CONTINUAL LEARNING IN VISION TRANSFORMER
    Takeda, Mana
    Yanai, Keiji
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 616 - 620
  • [9] Compact and Efficient Multitask Learning in Vision, Language and Speech
    Al-Rawi, Mohammed
    Valveny, Ernest
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2933 - 2942
  • [10] Multiattribute multitask transformer framework for vision-based structural health monitoring
    Gao, Yuqing
    Yang, Jianfei
    Qian, Hanjie
    Mosalam, Khalid M.
    [J]. COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (17) : 2358 - 2377