Patch-Based Separable Transformer for Visual Recognition

被引:1
|
作者
Sun, Shuyang [1 ]
Yue, Xiaoyu [2 ]
Zhao, Hengshuang [3 ]
Torr, Philip H. S. [1 ]
Bai, Song [4 ]
机构
[1] Univ Oxford, Dept Engn Sci, Oxford OX1 2JD, England
[2] Univ Sydney, Camperdown, NSW 2006, Australia
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] ByteDance AI Lab, Beijing 100086, Peoples R China
基金
英国工程与自然科学研究理事会;
关键词
Task analysis; Current transformers; Visualization; Computer architecture; Feature extraction; Convolutional neural networks; Object detection; Transformer; image classification; object detection; instance segmentation;
D O I
10.1109/TPAMI.2022.3231725
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The computational complexity of transformers limits it to be widely deployed onto frameworks for visual recognition. Recent work Dosovitskiy et al. 2021 significantly accelerates the network processing speed by reducing the resolution at the beginning of the network, however, it is still hard to be directly generalized onto other downstream tasks e.g.object detection and segmentation like CNN. In this paper, we present a transformer-based architecture retaining both the local and global interactions within the network, and can be transferable to other downstream tasks. The proposed architecture reforms the original full spatial self-attention into pixel-wise local attention and patch-wise global attention. Such factorization saves the computational cost while retaining the information of different granularities, which helps generate multi-scale features required by different tasks. By exploiting the factorized attention, we construct a Separable Transformer (SeT) for visual modeling. Experimental results show that SeT outperforms the previous state-of-the-art transformer-based approaches and its CNN counterparts on three major tasks including image classification, object detection and instance segmentation.(1)
引用
收藏
页码:9241 / 9247
页数:7
相关论文
共 50 条
  • [1] DPT: Deformable Patch-based Transformer for Visual Recognition
    Chen, Zhiyang
    Zhu, Yousong
    Zhao, Chaoyang
    Hu, Guosheng
    Zeng, Wei
    Wang, Jinqiao
    Tang, Ming
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2899 - 2907
  • [2] A Review of Codebook Models in Patch-Based Visual Object Recognition
    Amirthalingam Ramanan
    Mahesan Niranjan
    [J]. Journal of Signal Processing Systems, 2012, 68 : 333 - 352
  • [3] A Review of Codebook Models in Patch-Based Visual Object Recognition
    Ramanan, Amirthalingam
    Niranjan, Mahesan
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2012, 68 (03): : 333 - 352
  • [4] Visual tracking with structured patch-based model
    Li, Fu
    Jia, Xu
    Xiang, Cheng
    Lu, Huchuan
    [J]. IMAGE AND VISION COMPUTING, 2017, 60 : 124 - 133
  • [5] A PATCH-BASED SPARSE REPRESENTATION FOR SKETCH RECOGNITION
    Qi Yonggang
    Zhang Honggang
    Song Yizhe
    Tan Zhenghua
    [J]. 2014 4TH IEEE INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2014, : 343 - 346
  • [6] PATCH-BASED FACE RECOGNITION FROM VIDEO
    Hu, Changbo
    Harguess, Josh
    Aggarwal, J. K.
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 3321 - 3324
  • [7] Patch-based attack on traffic sign recognition
    Ye, Bin
    Yin, Huilin
    Yan, Jun
    Ge, Wanchen
    [J]. 2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 164 - 171
  • [8] Patch-based Scale Calculation for Visual Tracking
    Xu, Yulong
    Zhang, Yafei
    Wang, Jiabao
    Li, Yang
    Li, Hang
    [J]. 2015 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP), 2015,
  • [9] Random Sampling for Patch-based Face Recognition
    Cheheb, Ismahane
    Al-Maadeed, Noor
    Al-Madeed, Somaya
    Bouridane, Ahmed
    Jiang, Richard
    [J]. 2017 5TH INTERNATIONAL WORKSHOP ON BIOMETRICS AND FORENSICS (IWBF 2017), 2017,
  • [10] Patch-Based Transformer for Low-Light Image Enhancement
    Zhang, Yu
    Jiang, Shan
    Tang, Xiangyun
    [J]. 2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 268 - 273