Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey

被引:0
|
作者
Haruna, Yunusa [1 ]
Qin, Shiyin [1 ]
Chukkol, Abdulrahman Hamman Adama [2 ]
Yusuf, Abdulganiyu Abdu [3 ]
Bello, Isah [4 ]
Lawan, Adamu [5 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Informat & Elect, Beijing, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
[4] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[5] Beihang Univ, Sch Comp Sci & Technol, Beijing, Peoples R China
关键词
Attention mechanism; Convolutional neural network; Hybrid models; Image classification; Object detection; Vision transformer;
D O I
10.1016/j.engappai.2025.110057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The hybrid of Convolutional Neural Network (CNN) and Vision Transformer (ViT) architecture has emerged as a groundbreaking approach, pushing the boundaries of Computer Vision (CV), significantly advancing CV tasks such as image classification, object detection, and segmentation. This comprehensive review provides a thorough examination of the literature on state-of-the-art hybrid CNN-ViT architectures, exploring the synergies between these two approaches. The main content of this survey includes: (1) a background on the vanilla CNN and ViT, (2) systematic review of various taxonomic hybrid designs to explore the synergy achieved through merging CNN and ViT models, (3) comparative analysis, task-specific synergy and real-world application among various hybrid architectures, (4) challenges and future directions for hybrid models, (5) lastly, the survey concludes with a summary of key findings and recommendations. Through this exploration, the survey aims to serve as a guiding resource, enhancing understanding of the dynamics between CNN and ViT and their impact on future developments in CV.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] Convolutional Neural Networks Implementations for Computer Vision
    Michalski, Pawel
    Ruszczak, Bogdan
    Tomaszewski, Michal
    BIOMEDICAL ENGINEERING AND NEUROSCIENCE, 2018, 720 : 98 - 110
  • [22] A review of convolutional neural networks in computer vision
    Xia Zhao
    Limin Wang
    Yufei Zhang
    Xuming Han
    Muhammet Deveci
    Milan Parmar
    Artificial Intelligence Review, 57
  • [23] A review of convolutional neural networks in computer vision
    Zhao, Xia
    Wang, Limin
    Zhang, Yufei
    Han, Xuming
    Deveci, Muhammet
    Parmar, Milan
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (04)
  • [24] Hierarchical neural network architectures for vision system
    Lin, Jin-Kun
    Mei, Gee-Gwo
    Liu, Wentai
    Chen, Su-shing
    Neural Networks, 1988, 1 (1 SUPPL)
  • [25] Attention-based Convolutional Neural Network for Computer Vision Color Constancy
    Koscevic, Karlo
    Subasic, Marko
    Loncaric, Sven
    PROCEEDINGS OF THE 2019 11TH INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS (ISPA 2019), 2019, : 372 - 377
  • [26] A Computer Vision Approach to Classify Local Flower using Convolutional Neural Network
    Islam, Saiful
    Foysal, Md Ferdouse Ahmed
    Jahan, Nusrat
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 1200 - 1204
  • [27] Causal Fusion of Convolutional Neural Network and Vision Transformer for Image Anomaly Detection and Localization
    Zhang, Shuo
    Hu, Xiongpeng
    Liu, Jing
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME 2024, 2024,
  • [28] Lesion identification in fundus images via convolutional neural network-vision transformer
    Lian, Jian
    Liu, Tianyu
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88
  • [29] Waste Collection & Segregation using Computer Vision and Convolutional Neural Network for Vessels
    Sruthy, V
    Akshaya
    Anjana, S.
    Ponnaganti, Sai Supriya
    Pillai, V. Gokul
    Preetha, P. K.
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, AND INTELLIGENT SYSTEMS (ICCCIS), 2021, : 1043 - 1048
  • [30] Vision transformer based convolutional neural network for breast cancer histopathological images classification
    ABIMOULOUD M.L.
    BENSID K.
    Elleuch M.
    Ammar M.B.
    KHERALLAH M.
    Multimedia Tools and Applications, 2024, 83 (39) : 86833 - 86868