Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey

被引:0
|
作者
Haruna, Yunusa [1 ]
Qin, Shiyin [1 ]
Chukkol, Abdulrahman Hamman Adama [2 ]
Yusuf, Abdulganiyu Abdu [3 ]
Bello, Isah [4 ]
Lawan, Adamu [5 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Informat & Elect, Beijing, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
[4] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[5] Beihang Univ, Sch Comp Sci & Technol, Beijing, Peoples R China
关键词
Attention mechanism; Convolutional neural network; Hybrid models; Image classification; Object detection; Vision transformer;
D O I
10.1016/j.engappai.2025.110057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The hybrid of Convolutional Neural Network (CNN) and Vision Transformer (ViT) architecture has emerged as a groundbreaking approach, pushing the boundaries of Computer Vision (CV), significantly advancing CV tasks such as image classification, object detection, and segmentation. This comprehensive review provides a thorough examination of the literature on state-of-the-art hybrid CNN-ViT architectures, exploring the synergies between these two approaches. The main content of this survey includes: (1) a background on the vanilla CNN and ViT, (2) systematic review of various taxonomic hybrid designs to explore the synergy achieved through merging CNN and ViT models, (3) comparative analysis, task-specific synergy and real-world application among various hybrid architectures, (4) challenges and future directions for hybrid models, (5) lastly, the survey concludes with a summary of key findings and recommendations. Through this exploration, the survey aims to serve as a guiding resource, enhancing understanding of the dynamics between CNN and ViT and their impact on future developments in CV.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] TCURL: Exploring hybrid transformer and convolutional neural network on phishing URL detection
    Wang, Chenguang
    Chen, Yuanyuan
    KNOWLEDGE-BASED SYSTEMS, 2022, 258
  • [42] Spiking Convolutional Vision Transformer
    Talafha, Sameerah
    Rekabdar, Banafsheh
    Mousas, Christos
    Ekenna, Chinwe
    2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 225 - 226
  • [43] A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
    Srinivas, Suraj
    Sarvadevabhatla, Ravi Kiran
    Mopuri, Konda Reddy
    Prabhu, Nikita
    Kruthiventi, Srinivas S. S.
    Babu, R. Venkatesh
    FRONTIERS IN ROBOTICS AND AI, 2016, 2
  • [44] A Survey on Vision Transformer
    Han, Kai
    Wang, Yunhe
    Chen, Hanting
    Chen, Xinghao
    Guo, Jianyuan
    Liu, Zhenhua
    Tang, Yehui
    Xiao, An
    Xu, Chunjing
    Xu, Yixing
    Yang, Zhaohui
    Zhang, Yiman
    Tao, Dacheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 87 - 110
  • [45] Mildew detection in rice grains based on computer vision and the YOLO convolutional neural network
    Sun, Ke
    Tang, Mengdi
    Li, Shu
    Tong, Siyuan
    FOOD SCIENCE & NUTRITION, 2024, 12 (02): : 860 - 868
  • [46] Computer Vision Detection of Salmon Muscle Gaping Using Convolutional Neural Network Features
    Xu, Jun-Li
    Sun, Da-Wen
    FOOD ANALYTICAL METHODS, 2018, 11 (01) : 34 - 47
  • [47] A Lightweight and Efficient Distracted Driver Detection Model Fusing Convolutional Neural Network and Vision Transformer
    Li, Zhao
    Zhao, Xia
    Wu, Fuwei
    Chen, Dan
    Wang, Chang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 19962 - 19978
  • [48] A Lightweight Face Detector via Bi-Stream Convolutional Neural Network and Vision Transformer
    Zhang, Zekun
    Chao, Qingqing
    Wang, Shijie
    Yu, Teng
    INFORMATION, 2024, 15 (05)
  • [49] Classifying the molecular subtype of breast cancer using vision transformer and convolutional neural network features
    Kai, Chiharu
    Tamori, Hideaki
    Ohtsuka, Tsunehiro
    Nara, Miyako
    Yoshida, Akifumi
    Sato, Ikumi
    Futamura, Hitoshi
    Kodama, Naoki
    Kasai, Satoshi
    BREAST CANCER RESEARCH AND TREATMENT, 2025, : 771 - 782
  • [50] Computer Vision Detection of Salmon Muscle Gaping Using Convolutional Neural Network Features
    Jun-Li Xu
    Da-Wen Sun
    Food Analytical Methods, 2018, 11 : 34 - 47