Enhancing Computer Vision Performance: A Hybrid Deep Learning Approach with CNNs and Vision Transformers

被引:1
|
作者
Sardar, Abha Singh [1 ]
Ranjan, Vivek [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Dept Comp Sci & Engn, Bhopal, Madhya Pradesh, India
关键词
Convolutional Neural Networks (CNNs); Vision Transformers (ViTs); Image classification; Plant Disease; Limited data;
D O I
10.1007/978-3-031-58174-8_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article explores the growing prominence of deep learning algorithms in computer vision tasks, focusing on the strengths and weaknesses of Convolutional Neural Networks and Vision Transformers (ViTs). Convolutional Neural Network (CNNs) have dominated computer vision tasks since their inception due to their ability to identify features irrespective of their location, scale, or orientation. However, their efficiency is limited, particularly in managing long-range dependencies. Conversely, Vision Transformers (ViTs), while high performing, are "data-hungry" and require substantial training data to reach their full potential, posing a significant obstacle in areas with limited data availability such as healthcare and plant pathology. To address these limitations, we propose a hybrid approach that integrates the strengths of both CNNs and ViTs, aiming to create a robust model that is efficient with a range of data sizes. Testing on the Plant Disease and Tomato Leaf Disease Classification datasets demonstrates the efficacy of our model, with a marked improvement in F1 score, accuracy, and a significant reduction in loss compared to the base CNN. These findings demonstrate the potential of the suggested method in identifying plant diseases, making a significant contribution to advancements in agricultural technology. This research initiates a crucial discussion on balancing performance and practical data constraints in the fast-evolving field of deep learning.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [1] A hybrid approach of vision transformers and CNNs for detection of ulcerative colitis
    Shah, Syed Abdullah
    Taj, Imran
    Usman, Syed Muhammad
    Shah, Syed Nehal Hassan
    Imran, Ali Shariq
    Khalid, Shehzad
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [2] Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review
    Amjoud, Ayoub Benali
    Amrouch, Mustapha
    IEEE ACCESS, 2023, 11 : 35479 - 35516
  • [3] From modern CNNs to vision transformers: Assessing the performance, robustness, and classification strategies of deep learning models in histopathology
    Springenberg, Maximilian
    Frommholz, Annika
    Wenzel, Markus
    Weicken, Eva
    Ma, Jackie
    Strodthoff, Nils
    MEDICAL IMAGE ANALYSIS, 2023, 87
  • [4] Enhancing Drowning Surveillance with a Hybrid Vision Transformer Model: A Deep Learning Approach
    Zhang, Yingying
    Li, Yancheng
    Qu, Qiang
    Lin, Huai
    Seng, Dewen
    TRAITEMENT DU SIGNAL, 2023, 40 (06) : 2861 - 2867
  • [5] Evaluating Deep CNNs and Vision Transformers for Plant Leaf Disease Classification
    Bhuyan, Parag
    Singh, Pranav Kumar
    DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2024, 2024, 14501 : 293 - 306
  • [6] Hybrid Vision Transformers and CNNs for Enhanced Transmission Line Segmentation in Aerial Images
    Nguyen, Hoanh
    Nguyen, Tuan Anh
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 434 - 442
  • [7] Deep Learning for Assistive Computer Vision
    Leo, Marco
    Furnari, Antonino
    Medioni, Gerard G.
    Trivedi, Mohan
    Farinella, Giovanni M.
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT VI, 2019, 11134 : 3 - 14
  • [8] The Application of Deep Learning in Computer Vision
    Wu, Qing
    Liu, Yungang
    Li, Qiang
    Jin, Shaoli
    Li, Fengzhong
    2017 CHINESE AUTOMATION CONGRESS (CAC), 2017, : 6522 - 6527
  • [9] Vision Transformers for Computer Go
    Sagri, Amani
    Cazenave, Tristan
    Arjonilla, Jerome
    Saffidine, Abdallah
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2024, PT II, 2024, 14635 : 376 - 388
  • [10] Enhancing Skin Cancer Detection with Transfer Learning and Vision Transformers
    Ahmad, Istiak
    Alsulami, Bassma Saleh
    Alqurashi, Fahad
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (10) : 1027 - 1034