Multi-Task Learning With Multi-Query Transformer for Dense Prediction

被引:15
|
作者
Xu, Yangyang [1 ]
Li, Xiangtai [2 ]
Yuan, Haobo [1 ]
Yang, Yibo [3 ]
Zhang, Lefei [1 ,4 ]
机构
[1] Wuhan Univ, Inst Artificial Intelligence, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] Nanyang Technol Univ, S Lab, Singapore 637335, Singapore
[3] JD Explore Acad, Beijing 101111, Peoples R China
[4] Hubei Luojia Lab, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Scene understanding; multi-task learning; dense prediction; transformers; NETWORK;
D O I
10.1109/TCSVT.2023.3292995
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Previous multi-task dense prediction studies developed complex pipelines such as multi-modal distillations in multiple stages or searching for task relational contexts for each task. The core insight beyond these methods is to maximize the mutual effects of each task. Inspired by the recent query-based Transformers, we propose a simple pipeline named Multi-Query Transformer (MQTransformer) that is equipped with multiple queries from different tasks to facilitate the reasoning among multiple tasks and simplify the cross-task interaction pipeline. Instead of modeling the dense per-pixel context among different tasks, we seek a task-specific proxy to perform cross-task reasoning via multiple queries where each query encodes the task-related context. The MQTransformer is composed of three key components: shared encoder, cross-task query attention module and shared decoder. We first model each task with a task-relevant query. Then both the task-specific feature output by the feature extractor and the task-relevant query are fed into the shared encoder, thus encoding the task-relevant query from the task-specific feature. Secondly, we design a cross-task query attention module to reason the dependencies among multiple task-relevant queries; this enables the module to only focus on the query-level interaction. Finally, we use a shared decoder to gradually refine the image features with the reasoned query features from different tasks. Extensive experiment results on two dense prediction datasets (NYUD-v2 and PASCAL-Context) show that the proposed method is an effective approach and achieves state-of-the-art results.
引用
收藏
页码:1228 / 1240
页数:13
相关论文
共 50 条
  • [31] Deep Multi-task Learning for Air Quality Prediction
    Wang, Bin
    Yan, Zheng
    Lu, Jie
    Zhang, Guangquan
    Li, Tianrui
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 93 - 103
  • [32] Multi-task Learning for Mortality Prediction in LDCT Images
    Guo, Hengtao
    Kruger, Melanie
    Wang, Ge
    Kalra, Mannudeep K.
    Yan, Pingkun
    MEDICAL IMAGING 2020: COMPUTER-AIDED DIAGNOSIS, 2020, 11314
  • [33] Entity-aware Multi-task Learning for Query Understanding at Walmart
    Peng, Zhiyuan
    Dave, Vachik
    McNabb, Nicole
    Sharnagat, Rahul
    Magnani, Alessandro
    Liao, Ciya
    Fang, Yi
    Rajanala, Sravanthi
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 4733 - 4742
  • [34] Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer
    Yuan, Zhaoquan
    Peng, Xiao
    Wu, Xiao
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1313 - 1321
  • [35] Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
    Shen, Jiaming
    Karimzadehgan, Maryam
    Bendersky, Michael
    Qin, Zhen
    Metzler, Donald
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 2127 - 2135
  • [36] Multi-Task Transformer Visualization to build Trust for Clinical Outcome Prediction
    Antweiler, Dario
    Gallusser, Florian
    Fuchs, Georg
    2023 WORKSHOP ON VISUAL ANALYTICS IN HEALTHCARE, VAHC, 2023, : 21 - 26
  • [37] A Multi-task Transformer Architecture for Drone State Identification and Trajectory Prediction
    Souli, Nicolas
    Palamas, Andreas
    Panayiotou, Tania
    Kolios, Panayiotis
    Ellinas, Georgios
    2024 20TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS, DCOSS-IOT 2024, 2024, : 285 - 291
  • [38] A Transformer-Embedded Multi-Task Model for Dose Distribution Prediction
    Wen, Lu
    Xiao, Jianghong
    Tan, Shuai
    Wu, Xi
    Zhou, Jiliu
    Peng, Xingchen
    Wang, Yan
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2023, 33 (08)
  • [39] A multi-query optimizer for Monet
    Manegold, S
    Pellenkoft, A
    Kersten, M
    ADVANCES IN DATABASES, 2000, 1832 : 36 - 50
  • [40] Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models
    Huang, Huimin
    Huang, Yawen
    Lin, Lanfen
    Tong, Ruofeng
    Chen, Yen-Wei
    Zheng, Hao
    Li, Yuexiang
    Zheng, Yefeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 28181 - 28190