InvPT plus plus : Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding

被引:0
|
作者
Ye, Hanrong [1 ]
Xu, Dan [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China
关键词
Dense prediction; multi-task learning; scene understanding; transformer;
D O I
10.1109/TPAMI.2024.3397031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-task scene understanding aims to design models that can simultaneously predict several scene understanding tasks with one versatile model. Previous studies typically process multi-task features in a more local way, and thus cannot effectively learn spatially global and cross-task interactions, which hampers the models' ability to fully leverage the consistency of various tasks in multi-task learning. To tackle this problem, we propose an Inverted Pyramid multi-task Transformer, capable of modeling cross-task interaction among spatial features of different tasks in a global context. Specifically, we first utilize a transformer encoder to capture task-generic features for all tasks. And then, we design a transformer decoder to establish spatial and cross-task interaction globally, and a novel UP-Transformer block is devised to increase the resolutions of multi-task features gradually and establish cross-task interaction at different scales. Furthermore, two types of Cross-Scale Self-Attention modules, i.e., Fusion Attention and Selective Attention, are proposed to efficiently facilitate cross-task interaction across different feature scales. An Encoder Feature Aggregation strategy is further introduced to better model multi-scale information in the decoder. Comprehensive experiments on several 2D/3D multi-task benchmarks clearly demonstrate our proposal's effectiveness, establishing significant state-of-the-art performances.
引用
收藏
页码:7493 / 7508
页数:16
相关论文
共 50 条
  • [1] Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
    Ye, Hanrong
    Xu, Dan
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 514 - 530
  • [2] Efficient Computation Sharing for Multi-Task Visual Scene Understanding
    Shoouri, Sara
    Yang, Mingyu
    Fan, Zichen
    Kim, Hun-Seok
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17084 - 17095
  • [3] SVM plus Regression and Multi-Task Learning
    Cai, Feng
    Cherkassky, Vladimir
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 503 - 509
  • [4] HSCNet plus plus : Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
    Wang, Shuzhe
    Laskar, Zakaria
    Melekhov, Iaroslav
    Li, Xiaotian
    Zhao, Yi
    Tolias, Giorgos
    Kannala, Juho
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (07) : 2530 - 2550
  • [5] Rwin-FPN plus plus : Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting
    Zeng, Chengbin
    Liu, Yi
    Song, Chunli
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [6] Connection Between SVM plus and Multi-Task Learning
    Liang, Lichen
    Cherkassky, Vladimir
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 2048 - 2054
  • [7] Multi-Task Deep Learning Design and Training Tool for Unified Visual Driving Scene Understanding
    Won, Woong-Jae
    Kim, Tae Hun
    Kwon, Soon
    2019 19TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2019), 2019, : 356 - 360
  • [8] HirMTL: Hierarchical Multi-Task Learning for dense scene understanding
    Luo, Huilan
    Hu, Weixia
    Wei, Yixiao
    He, Jianlong
    Yu, Minghao
    NEURAL NETWORKS, 2025, 181
  • [9] Pyramid Swin Transformer for Multi-task: Expanding to More Computer Vision Tasks
    Wang, Chenyu
    Endo, Toshio
    Hirofuchi, Takahiro
    Ikegami, Tsutomu
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, ACIVS 2023, 2023, 14124 : 53 - 65
  • [10] AnyFace plus plus : Deep Multi-Task, Multi-Domain Learning for Efficient Face AI
    Rakhimzhanova, Tomiris
    Kuzdeuov, Askat
    Varol, Huseyin Atakan
    SENSORS, 2024, 24 (18)