Hyneter:Hybrid Network Transformer for Multiple Computer Vision Tasks

被引:3
|
作者
Chen, Dong [1 ]
Miao, Duoqian [2 ]
Zhao, Xuerong [3 ]
机构
[1] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
[2] Tongji Univ, Shanghai 200092, Peoples R China
[3] Shanghai Normal Univ, Comp Sci & Technol Sch, Shanghai 201418, Peoples R China
关键词
Convolutional neural network (CNN); hybrid network; object detection; transformer;
D O I
10.1109/TII.2024.3367043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we point out that the essential differences between convolutional neural network (CNN)-based and transformer-based detectors, which cause worse performance of small object in transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. To address these differences, we propose a new vision transformer, called Hybrid Network Transformer (Hyneter), after preexperiments that indicate the gap causes CNN-based and transformer-based methods to increase size-different objects results unevenly. Different from the divide-and-conquer strategy in previous methods, Hyneters consist of hybrid network backbone (HNB) and dual switching (DS) module, which integrate local information and global dependencies, and transfer them simultaneously. Based on the balance strategy, HNB extends the range of local information by embedding convolution layers into transformer blocks in parallel, and DS adjusts excessive reliance on global dependencies outside the patch. Ablation studies illustrate that Hyneters achieve the state-of-the-art performance by a large margin of +2.1 similar to 13.2AP on COCO, and +3.1 similar to 6.5mIoU on VisDrone with lighter model size and lower computational cost in object detection. Furthermore, Hyneters achieve the state-of-the-art results on multiple computer vision tasks, such as object detection ( 60.1AP on COCO and 46.1AP on VisDrone), semantic segmentation ( 54.3AP on ADE20K), and instance segmentation ( 48.5AP(mask) on COCO), and surpass previous best methods. The code will be publicly available later.
引用
收藏
页码:8773 / 8785
页数:13
相关论文
共 50 条
  • [21] Architecture for Dynamic Allocation of Computer Vision Tasks
    Weissenfeld, Axel
    Opitz, Andreas
    Pflugfelder, Roman
    Fernandez, Gustavo
    ICDSC 2016: 10TH INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERA, 2016, : 50 - 55
  • [22] Computer Vision Onboard UAVs for Civilian Tasks
    Campoy, Pascual
    Correa, Juan F.
    Mondragon, Ivan
    Martinez, Carol
    Olivares, Miguel
    Mejias, Luis
    Artieda, Jorge
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2009, 54 (1-3) : 105 - 135
  • [23] Scheduling latency insensitive computer vision tasks
    Xu, RYD
    Jin, JS
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, 2005, 3758 : 1089 - 1100
  • [24] Waste classification using vision transformer based on multilayer hybrid convolution neural network
    Alrayes, Fatma S.
    Asiri, Mashael M.
    Maashi, Mashael S.
    Nour, Mohamed K.
    Rizwanullah, Mohammed
    Osman, Azza Elneil
    Drar, Suhanda
    Zamani, Abu Sarwar
    URBAN CLIMATE, 2023, 49
  • [25] A lightweight hybrid vision transformer network for radar-based human activity recognition
    Huan, Sha
    Wang, Zhaoyue
    Wang, Xiaoqiang
    Wu, Limei
    Yang, Xiaoxuan
    Huang, Hongming
    Dai, Gan E.
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [26] A lightweight hybrid vision transformer network for radar-based human activity recognition
    Sha Huan
    Zhaoyue Wang
    Xiaoqiang Wang
    Limei Wu
    Xiaoxuan Yang
    Hongming Huang
    Gan E. Dai
    Scientific Reports, 13
  • [27] A novel dual-granularity lightweight transformer for vision tasks
    Zhang, Ji
    Yu, Mingxin
    Lu, Wenshuai
    Dai, Yuxiang
    Shi, Huiyu
    You, Rui
    INTELLIGENT DATA ANALYSIS, 2024, 28 (05) : 1213 - 1228
  • [28] Adaptive Hybrid Vision Transformer for Small Datasets
    Yin, Mingjun
    Chang, Zhiyong
    Wang, Yan
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 873 - 880
  • [29] Pupil Detection Using Hybrid Vision Transformer
    Wang, Li
    Wang, Changyuan
    Zhang, Yu
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (12)
  • [30] Research on computer vision technology based on BP-LSTM hybrid network
    Yi, Qiaoling
    Ling, Shijia
    Chen, Guoluan
    Liu, Liangfang
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2023, 8 (02) : 975 - 984