Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey

被引:0
|
作者
Haruna, Yunusa [1 ]
Qin, Shiyin [1 ]
Chukkol, Abdulrahman Hamman Adama [2 ]
Yusuf, Abdulganiyu Abdu [3 ]
Bello, Isah [4 ]
Lawan, Adamu [5 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Informat & Elect, Beijing, Peoples R China
[3] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
[4] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[5] Beihang Univ, Sch Comp Sci & Technol, Beijing, Peoples R China
关键词
Attention mechanism; Convolutional neural network; Hybrid models; Image classification; Object detection; Vision transformer;
D O I
10.1016/j.engappai.2025.110057
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The hybrid of Convolutional Neural Network (CNN) and Vision Transformer (ViT) architecture has emerged as a groundbreaking approach, pushing the boundaries of Computer Vision (CV), significantly advancing CV tasks such as image classification, object detection, and segmentation. This comprehensive review provides a thorough examination of the literature on state-of-the-art hybrid CNN-ViT architectures, exploring the synergies between these two approaches. The main content of this survey includes: (1) a background on the vanilla CNN and ViT, (2) systematic review of various taxonomic hybrid designs to explore the synergy achieved through merging CNN and ViT models, (3) comparative analysis, task-specific synergy and real-world application among various hybrid architectures, (4) challenges and future directions for hybrid models, (5) lastly, the survey concludes with a summary of key findings and recommendations. Through this exploration, the survey aims to serve as a guiding resource, enhancing understanding of the dynamics between CNN and ViT and their impact on future developments in CV.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Performance evaluation of convolutional neural network and vision transformer models for groundwater potential mapping
    Sadeghi, Behnam
    Alesheikh, Ali Asghar
    Jafari, Ali
    Rezaie, Fatemeh
    JOURNAL OF HYDROLOGY, 2025, 654
  • [32] Fundus Image Classification Research Based on Ensemble Convolutional Neural Network and Vision Transformer
    Yuan Yuan
    Chen Minghui
    Ke Shuting
    Wang Teng
    He Longxi
    Lu Linjie
    Sun Hao
    Liu Jiannan
    CHINESE JOURNAL OF LASERS-ZHONGGUO JIGUANG, 2022, 49 (20):
  • [33] Target Detection Algorithm of Optimized Convolutional Neural Network under Computer Vision
    Cao, Liqun
    Lin, Shidong
    PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 923 - 930
  • [34] A forest fire smoke detection model combining convolutional neural network and vision transformer
    Zheng, Ying
    Zhang, Gui
    Tan, Sanqing
    Yang, Zhigao
    Wen, Dongxin
    Xiao, Huashun
    FRONTIERS IN FORESTS AND GLOBAL CHANGE, 2023, 6
  • [35] A Comparative Study of Vision Transformer and Convolutional Neural Network Models in Geological Fault Detection
    Wang, Jing
    Ma, Siteng
    An, Yu
    Dong, Ruihai
    IEEE ACCESS, 2024, 12 : 136148 - 136159
  • [36] CVTrack: Combined Convolutional Neural Network and Vision Transformer Fusion Model for Visual Tracking
    Wang, Jian
    Song, Yueming
    Song, Ce
    Tian, Haonan
    Zhang, Shuai
    Sun, Jinghui
    SENSORS, 2024, 24 (01)
  • [37] Generalizability of Convolutional Neural Network and Vision Transformer-Based OCT Segmentation Models
    Pely, Adam
    Wu, Zhichao
    Leng, Theodore
    Gao, Simon S.
    Chen, Hao
    Hejrati, Mohsen
    Zhang, Miao
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)
  • [39] ModuleNet: A Convolutional Neural Network for Stereo Vision
    Renteria-Vidales, O. I.
    Cuevas-Tello, J. C.
    Reyes-Figueroa, A.
    Rivera, M.
    PATTERN RECOGNITION (MCPR 2020), 2020, 12088 : 219 - 228
  • [40] Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition
    Agrawal, Vanita
    Jagtap, Jayant
    Patil, Shruti
    Kotecha, Ketan
    METHODSX, 2024, 12