Hyneter:Hybrid Network Transformer for Multiple Computer Vision Tasks

被引:3
|
作者
Chen, Dong [1 ]
Miao, Duoqian [2 ]
Zhao, Xuerong [3 ]
机构
[1] Tongji Univ, Minist Educ, Key Lab Embedded Syst & Serv Comp, Shanghai 200092, Peoples R China
[2] Tongji Univ, Shanghai 200092, Peoples R China
[3] Shanghai Normal Univ, Comp Sci & Technol Sch, Shanghai 201418, Peoples R China
关键词
Convolutional neural network (CNN); hybrid network; object detection; transformer;
D O I
10.1109/TII.2024.3367043
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we point out that the essential differences between convolutional neural network (CNN)-based and transformer-based detectors, which cause worse performance of small object in transformer-based methods, are the gap between local information and global dependencies in feature extraction and propagation. To address these differences, we propose a new vision transformer, called Hybrid Network Transformer (Hyneter), after preexperiments that indicate the gap causes CNN-based and transformer-based methods to increase size-different objects results unevenly. Different from the divide-and-conquer strategy in previous methods, Hyneters consist of hybrid network backbone (HNB) and dual switching (DS) module, which integrate local information and global dependencies, and transfer them simultaneously. Based on the balance strategy, HNB extends the range of local information by embedding convolution layers into transformer blocks in parallel, and DS adjusts excessive reliance on global dependencies outside the patch. Ablation studies illustrate that Hyneters achieve the state-of-the-art performance by a large margin of +2.1 similar to 13.2AP on COCO, and +3.1 similar to 6.5mIoU on VisDrone with lighter model size and lower computational cost in object detection. Furthermore, Hyneters achieve the state-of-the-art results on multiple computer vision tasks, such as object detection ( 60.1AP on COCO and 46.1AP on VisDrone), semantic segmentation ( 54.3AP on ADE20K), and instance segmentation ( 48.5AP(mask) on COCO), and surpass previous best methods. The code will be publicly available later.
引用
收藏
页码:8773 / 8785
页数:13
相关论文
共 50 条
  • [1] ViT-MVT: A Unified Vision Transformer Network for Multiple Vision Tasks
    Xie, Tao
    Dai, Kun
    Jiang, Zhiqiang
    Li, Ruifeng
    Mao, Shouren
    Wang, Ke
    Zhao, Lijun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 15
  • [2] Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey
    Haruna, Yunusa
    Qin, Shiyin
    Chukkol, Abdulrahman Hamman Adama
    Yusuf, Abdulganiyu Abdu
    Bello, Isah
    Lawan, Adamu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 144
  • [3] Efficient Multiple Loop Adjustment for Computer Vision Tasks
    Meidow, Jochen
    PHOTOGRAMMETRIE FERNERKUNDUNG GEOINFORMATION, 2012, (05): : 501 - 510
  • [4] A lightweight vision transformer with symmetric modules for vision tasks
    Liang, Shengjun
    Yu, Mingxin
    Lu, Wenshuai
    Ji, Xinglong
    Tang, Xiongxin
    Liu, Xiaolin
    You, Rui
    INTELLIGENT DATA ANALYSIS, 2023, 27 (06) : 1741 - 1757
  • [5] Pyramid Swin Transformer for Multi-task: Expanding to More Computer Vision Tasks
    Wang, Chenyu
    Endo, Toshio
    Hirofuchi, Takahiro
    Ikegami, Tsutomu
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, ACIVS 2023, 2023, 14124 : 53 - 65
  • [6] Batch Mode Adaptive Multiple Instance Learning for Computer Vision Tasks
    Li, Wen
    Duan, Lixin
    Tsang, Ivor Wai-Hung
    Xu, Dong
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 2368 - 2375
  • [7] Surrogate Contrastive Network for Supervised Band Selection in Multispectral Computer Vision Tasks
    Bernal, Edgar A.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 998 - 1006
  • [8] Research progress of computer vision tasks based on deep learning and SAE network
    Ling, Shijia
    Yi, Qiaoling
    Lan, Banru
    Liu, Liangfang
    APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2023, 8 (02) : 985 - 994
  • [9] Semantic Bottleneck for Computer Vision Tasks
    Bucher, Maxime
    Herbin, Stephane
    Jurie, Frederic
    COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 : 695 - 712
  • [10] Survey of Transformer Research in Computer Vision
    Li, Xiang
    Zhang, Tao
    Zhang, Zhe
    Wei, Hongyang
    Qian, Yurong
    Computer Engineering and Applications, 2023, 59 (01) : 1 - 14