Hyneter:Hybrid Network Transformer for Multiple Computer Vision Tasks

被引:3
|
作者
Chen, Dong [1 ]
Miao, Duoqian [2 ]
Zhao, Xuerong [3 ]
机构
[1] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
[2] Tongji Univ, Shanghai 200092, Peoples R China
[3] Shanghai Normal Univ, Comp Sci & Technol Sch, Shanghai 201418, Peoples R China
关键词
Convolutional neural network (CNN); hybrid network; object detection; transformer;
D O I
10.1109/TII.2024.3367043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we point out that the essential differences between convolutional neural network (CNN)-based and transformer-based detectors, which cause worse performance of small object in transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. To address these differences, we propose a new vision transformer, called Hybrid Network Transformer (Hyneter), after preexperiments that indicate the gap causes CNN-based and transformer-based methods to increase size-different objects results unevenly. Different from the divide-and-conquer strategy in previous methods, Hyneters consist of hybrid network backbone (HNB) and dual switching (DS) module, which integrate local information and global dependencies, and transfer them simultaneously. Based on the balance strategy, HNB extends the range of local information by embedding convolution layers into transformer blocks in parallel, and DS adjusts excessive reliance on global dependencies outside the patch. Ablation studies illustrate that Hyneters achieve the state-of-the-art performance by a large margin of +2.1 similar to 13.2AP on COCO, and +3.1 similar to 6.5mIoU on VisDrone with lighter model size and lower computational cost in object detection. Furthermore, Hyneters achieve the state-of-the-art results on multiple computer vision tasks, such as object detection ( 60.1AP on COCO and 46.1AP on VisDrone), semantic segmentation ( 54.3AP on ADE20K), and instance segmentation ( 48.5AP(mask) on COCO), and surpass previous best methods. The code will be publicly available later.
引用
收藏
页码:8773 / 8785
页数:13
相关论文
共 50 条
  • [31] Difficulty Estimation with Action Scores for Computer Vision Tasks
    Arriaga, Octavio
    Palacio, Sebastian
    Valdenegro-Toro, Matias
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 245 - 253
  • [32] A Flexible Ensemble-SVM for Computer Vision Tasks
    Trichet, Remi
    O'Connor, Noel E.
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2016, : 51 - 58
  • [33] A Review of Generative Adversarial Networks for Computer Vision Tasks
    Simion, Ana-Maria
    Radu, Serban
    Florea, Adina Magda
    ELECTRONICS, 2024, 13 (04)
  • [34] A survey on unsupervised domain adaptation in computer vision tasks
    Sun Q.
    Zhao C.
    Tang Y.
    Qian F.
    Zhongguo Kexue Jishu Kexue/Scientia Sinica Technologica, 2022, 52 (01): : 26 - 54
  • [35] Computer vision tasks for intelligent aerospace perception: An overview
    CHEN HuiLin
    SUN QiYu
    LI FangFei
    TANG Yang
    Science China(Technological Sciences), 2024, 67 (09) : 2727 - 2748
  • [36] Computer vision tasks for intelligent aerospace perception: An overview
    Chen, Huilin
    Sun, Qiyu
    Li, Fangfei
    Tang, Yang
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2024, 67 (09) : 2727 - 2748
  • [37] Solving Computer Vision Tasks with Diffractive Neural Networks
    Yan, Tao
    Wu, Jiamin
    Zhou, Tiankuang
    Xie, Hao
    Xu, Feng
    Fan, Jingtao
    Fang, Lu
    Lin, Xing
    Dai, Qionghai
    OPTOELECTRONIC IMAGING AND MULTIMEDIA TECHNOLOGY VI, 2019, 11187
  • [38] Multispectral Plant Disease Detection with Vision Transformer-Convolutional Neural Network Hybrid Approaches
    De Silva, Malithi
    Brown, Dane
    SENSORS, 2023, 23 (20)
  • [39] A hybrid vision transformer and residual neural network model for fall detection using UWB radars
    Abudalfa, Shadi
    Bouchard, Kevin
    APPLIED INTELLIGENCE, 2025, 55 (03)
  • [40] CoVi-Net: A hybrid convolutional and vision transformer neural network for retinal vessel segmentation
    Jiang, Minshan
    Zhu, Yongfei
    Zhang, Xuedian
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 170