Multi-Task Learning With Multi-Query Transformer for Dense Prediction

被引:15
|
作者
Xu, Yangyang [1 ]
Li, Xiangtai [2 ]
Yuan, Haobo [1 ]
Yang, Yibo [3 ]
Zhang, Lefei [1 ,4 ]
机构
[1] Wuhan Univ, Inst Artificial Intelligence, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Nanyang Technol Univ, S Lab, Singapore 637335, Singapore
[3] JD Explore Acad, Beijing 101111, Peoples R China
[4] Hubei Luojia Lab, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene understanding; multi-task learning; dense prediction; transformers; NETWORK;
D O I
10.1109/TCSVT.2023.3292995
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Previous multi-task dense prediction studies developed complex pipelines such as multi-modal distillations in multiple stages or searching for task relational contexts for each task. The core insight beyond these methods is to maximize the mutual effects of each task. Inspired by the recent query-based Transformers, we propose a simple pipeline named Multi-Query Transformer (MQTransformer) that is equipped with multiple queries from different tasks to facilitate the reasoning among multiple tasks and simplify the cross-task interaction pipeline. Instead of modeling the dense per-pixel context among different tasks, we seek a task-specific proxy to perform cross-task reasoning via multiple queries where each query encodes the task-related context. The MQTransformer is composed of three key components: shared encoder, cross-task query attention module and shared decoder. We first model each task with a task-relevant query. Then both the task-specific feature output by the feature extractor and the task-relevant query are fed into the shared encoder, thus encoding the task-relevant query from the task-specific feature. Secondly, we design a cross-task query attention module to reason the dependencies among multiple task-relevant queries; this enables the module to only focus on the query-level interaction. Finally, we use a shared decoder to gradually refine the image features with the reasoned query features from different tasks. Extensive experiment results on two dense prediction datasets (NYUD-v2 and PASCAL-Context) show that the proposed method is an effective approach and achieves state-of-the-art results.
引用
收藏
页码:1228 / 1240
页数:13
相关论文
共 50 条
  • [41] Evaluating Multi-Query Sessions
    Kanoulas, Evangelos
    Carterette, Ben
    Clough, Paul D.
    Sanderson, Mark
    PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), 2011, : 1053 - 1062
  • [42] Pipelining in multi-query optimization
    Dalvi, NN
    Sanghai, SK
    Roy, P
    Sudarshan, S
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) : 728 - 762
  • [43] SPARQL Multi-Query Optimization
    Chen, Jiaqi
    Zhang, Fan
    Zou, Lei
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (IEEE TRUSTCOM) / 12TH IEEE INTERNATIONAL CONFERENCE ON BIG DATA SCIENCE AND ENGINEERING (IEEE BIGDATASE), 2018, : 1419 - 1425
  • [44] Multi-query Video Retrieval
    Wang, Zeyu
    Wu, Yu
    Narasimhan, Karthik
    Russakovsky, Olga
    COMPUTER VISION - ECCV 2022, PT XIV, 2022, 13674 : 233 - 249
  • [45] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
    Ainslie, Joshua
    Lee-Thorp, James
    de Jong, Michiel
    Zemlyanskiy, Yury
    Lebron, Federico
    Sanghai, Sumit
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4895 - 4901
  • [46] Multi-Task Learning Using Task Dependencies for Face Attributes Prediction
    Fan, Di
    Kim, Hyunwoo
    Kim, Junmo
    Liu, Yunhui
    Huang, Qiang
    APPLIED SCIENCES-BASEL, 2019, 9 (12):
  • [47] Learning Multi-Level Task Groups in Multi-Task Learning
    Han, Lei
    Zhang, Yu
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2638 - 2644
  • [48] Unified Transformer Multi-Task Learning for Intent Classification With Entity Recognition
    Benayas Alamos, Alberto Jose
    Hashempou, Reyhaneh
    Rumble, Damian
    Jameel, Shoaib
    De Amorim, Renato Cordeiro
    IEEE ACCESS, 2021, 9 : 147306 - 147314
  • [49] Multi-Task Multi-Sample Learning
    Aytar, Yusuf
    Zisserman, Andrew
    COMPUTER VISION - ECCV 2014 WORKSHOPS, PT III, 2015, 8927 : 78 - 91
  • [50] Bidirectional Transformer Based Multi-Task Learning for Natural Language Understanding
    Tripathi, Suraj
    Singh, Chirag
    Kumar, Abhay
    Pandey, Chandan
    Jain, Nishant
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019), 2019, 11608 : 54 - 65