Hyneter:Hybrid Network Transformer for Multiple Computer Vision Tasks

被引:3
|
作者
Chen, Dong [1 ]
Miao, Duoqian [2 ]
Zhao, Xuerong [3 ]
机构
[1] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
[2] Tongji Univ, Shanghai 200092, Peoples R China
[3] Shanghai Normal Univ, Comp Sci & Technol Sch, Shanghai 201418, Peoples R China
关键词
Convolutional neural network (CNN); hybrid network; object detection; transformer;
D O I
10.1109/TII.2024.3367043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we point out that the essential differences between convolutional neural network (CNN)-based and transformer-based detectors, which cause worse performance of small object in transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. To address these differences, we propose a new vision transformer, called Hybrid Network Transformer (Hyneter), after preexperiments that indicate the gap causes CNN-based and transformer-based methods to increase size-different objects results unevenly. Different from the divide-and-conquer strategy in previous methods, Hyneters consist of hybrid network backbone (HNB) and dual switching (DS) module, which integrate local information and global dependencies, and transfer them simultaneously. Based on the balance strategy, HNB extends the range of local information by embedding convolution layers into transformer blocks in parallel, and DS adjusts excessive reliance on global dependencies outside the patch. Ablation studies illustrate that Hyneters achieve the state-of-the-art performance by a large margin of +2.1 similar to 13.2AP on COCO, and +3.1 similar to 6.5mIoU on VisDrone with lighter model size and lower computational cost in object detection. Furthermore, Hyneters achieve the state-of-the-art results on multiple computer vision tasks, such as object detection ( 60.1AP on COCO and 46.1AP on VisDrone), semantic segmentation ( 54.3AP on ADE20K), and instance segmentation ( 48.5AP(mask) on COCO), and surpass previous best methods. The code will be publicly available later.
引用
收藏
页码:8773 / 8785
页数:13
相关论文
共 50 条
  • [41] Towards Feasible Capsule Network for Vision Tasks
    Vu, Dang Thanh
    An, Le Bao Thai
    Kim, Jin Young
    Yu, Gwang Hyun
    Ferrari, Gianluigi
    APPLIED SCIENCES-BASEL, 2023, 13 (18):
  • [42] Chinese Lipreading Network Based on Vision Transformer
    Xue, Feng
    Hong, Zikun
    Li, Shujie
    Li, Yu
    Xie, Yincen
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (12): : 1111 - 1121
  • [43] Hybrid EEG-fNIRS Asynchronous Brain-Computer Interface for Multiple Motor Tasks
    Buccino, Alessio Paolo
    Keles, Hasan Onur
    Omurtag, Ahmet
    PLOS ONE, 2016, 11 (01):
  • [44] Polyp Segmentation Using a Hybrid Vision Transformer and a Hybrid Loss Function
    Goceri, Evgin
    JOURNAL OF IMAGING INFORMATICS IN MEDICINE, 2024, 37 (02): : 851 - 863
  • [45] Fcaformer: Forward Cross Attention in Hybrid Vision Transformer
    Zhang, Haokui
    Hu, Wenze
    Wang, Xiaoyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6037 - 6046
  • [46] Multiple vision architectures-based hybrid network for hyperspectral image classification
    Zhao, Feng
    Zhang, Junjie
    Meng, Zhe
    Liu, Hanqiang
    Chang, Zhenhui
    Fan, Jiulun
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [47] Hybrid Transformer Network for Deepfake Detection
    Khan, Sohail Ahmed
    Dang-Nguyen, Duc-Tien
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 8 - 14
  • [48] SVIT: HYBRID VISION TRANSFORMER MODELS WITH SCATTERING TRANSFORM
    Qiu, Tianming
    Gui, Ming
    Yan, Cheng
    Zhao, Ziqing
    Shen, Hao
    2022 IEEE 32ND INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2022,
  • [49] HYBRID VISION TRANSFORMER MODEL FOR HYPERSPECTRAL IMAGE CLASSIFICATION
    Yang, Jiaqi
    Du, Bo
    Wu, Chen
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1388 - 1391
  • [50] COMPUTER ASSURES INCENTIVE OPPORTUNITIES FOR MULTIPLE TASKS
    BOBILLO, T
    ICE, L
    BUTLER, P
    INDUSTRIAL ENGINEERING, 1980, 12 (10): : 40 - &