Multi-Task Learning With Multi-Query Transformer for Dense Prediction

被引:7
|
作者
Xu, Yangyang [1 ]
Li, Xiangtai [2 ]
Yuan, Haobo [1 ]
Yang, Yibo [3 ]
Zhang, Lefei [1 ,4 ]
机构
[1] Wuhan Univ, Inst Artificial Intelligence, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Nanyang Technol Univ, S Lab, Singapore 637335, Singapore
[3] JD Explore Acad, Beijing 101111, Peoples R China
[4] Hubei Luojia Lab, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene understanding; multi-task learning; dense prediction; transformers; NETWORK;
D O I
10.1109/TCSVT.2023.3292995
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Previous multi-task dense prediction studies developed complex pipelines such as multi-modal distillations in multiple stages or searching for task relational contexts for each task. The core insight beyond these methods is to maximize the mutual effects of each task. Inspired by the recent query-based Transformers, we propose a simple pipeline named Multi-Query Transformer (MQTransformer) that is equipped with multiple queries from different tasks to facilitate the reasoning among multiple tasks and simplify the cross-task interaction pipeline. Instead of modeling the dense per-pixel context among different tasks, we seek a task-specific proxy to perform cross-task reasoning via multiple queries where each query encodes the task-related context. The MQTransformer is composed of three key components: shared encoder, cross-task query attention module and shared decoder. We first model each task with a task-relevant query. Then both the task-specific feature output by the feature extractor and the task-relevant query are fed into the shared encoder, thus encoding the task-relevant query from the task-specific feature. Secondly, we design a cross-task query attention module to reason the dependencies among multiple task-relevant queries; this enables the module to only focus on the query-level interaction. Finally, we use a shared decoder to gradually refine the image features with the reasoned query features from different tasks. Extensive experiment results on two dense prediction datasets (NYUD-v2 and PASCAL-Context) show that the proposed method is an effective approach and achieves state-of-the-art results.
引用
收藏
页码:1228 / 1240
页数:13
相关论文
共 50 条
  • [1] Prompt Guided Transformer for Multi-Task Dense Prediction
    Lu, Yuxiang
    Sirejiding, Shalayiding
    Ding, Yue
    Wang, Chunlin
    Lu, Hongtao
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6375 - 6385
  • [2] TFUT: Task fusion upward transformer model for multi-task learning on dense prediction
    Xin, Zewei
    Sirejiding, Shalayiding
    Lu, Yuxiang
    Ding, Yue
    Wang, Chunlin
    Alsarhan, Tamam
    Lu, Hongtao
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 244
  • [3] Multi-Task Learning with Knowledge Distillation for Dense Prediction
    Xu, Yangyang
    Yang, Yibo
    Zhang, Lefei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21493 - 21502
  • [4] Multi-Task Learning for Dense Prediction Tasks: A Survey
    Vandenhende, Simon
    Georgoulis, Stamatios
    Van Gansbeke, Wouter
    Proesmans, Marc
    Dai, Dengxin
    Van Gool, Luc
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (07) : 3614 - 3633
  • [5] Paraphrase Bidirectional Transformer with Multi-Task Learning
    Ko, Bowon
    Choi, Ho-Jin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 217 - 220
  • [6] Multi-task learning for pKa prediction
    Skolidis, Grigorios
    Hansen, Katja
    Sanguinetti, Guido
    Rupp, Matthias
    [J]. JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN, 2012, 26 (07) : 883 - 895
  • [7] Multi-task learning for pKa prediction
    Grigorios Skolidis
    Katja Hansen
    Guido Sanguinetti
    Matthias Rupp
    [J]. Journal of Computer-Aided Molecular Design, 2012, 26 : 883 - 895
  • [8] Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
    Ye, Hanrong
    Xu, Dan
    [J]. COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 514 - 530
  • [9] Multi-Task Learning Model for Kazakh Query Understanding
    Haisa, Gulizada
    Altenbek, Gulila
    [J]. SENSORS, 2022, 22 (24)
  • [10] MTLFormer: Multi-Task Learning Guided Transformer Network for Business Process Prediction
    Wang, Jiaojiao
    Huang, Jiawei
    Ma, Xiaoyu
    Li, Zhongjin
    Wang, Yaqi
    Yu, Dingguo
    [J]. IEEE ACCESS, 2023, 11 : 76722 - 76738