Enhancing Computer Vision Performance: A Hybrid Deep Learning Approach with CNNs and Vision Transformers

被引:1
|
作者
Sardar, Abha Singh [1 ]
Ranjan, Vivek [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Dept Comp Sci & Engn, Bhopal, Madhya Pradesh, India
来源
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT II | 2024年 / 2010卷
关键词
Convolutional Neural Networks (CNNs); Vision Transformers (ViTs); Image classification; Plant Disease; Limited data;
D O I
10.1007/978-3-031-58174-8_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article explores the growing prominence of deep learning algorithms in computer vision tasks, focusing on the strengths and weaknesses of Convolutional Neural Networks and Vision Transformers (ViTs). Convolutional Neural Network (CNNs) have dominated computer vision tasks since their inception due to their ability to identify features irrespective of their location, scale, or orientation. However, their efficiency is limited, particularly in managing long-range dependencies. Conversely, Vision Transformers (ViTs), while high performing, are "data-hungry" and require substantial training data to reach their full potential, posing a significant obstacle in areas with limited data availability such as healthcare and plant pathology. To address these limitations, we propose a hybrid approach that integrates the strengths of both CNNs and ViTs, aiming to create a robust model that is efficient with a range of data sizes. Testing on the Plant Disease and Tomato Leaf Disease Classification datasets demonstrates the efficacy of our model, with a marked improvement in F1 score, accuracy, and a significant reduction in loss compared to the base CNN. These findings demonstrate the potential of the suggested method in identifying plant diseases, making a significant contribution to advancements in agricultural technology. This research initiates a crucial discussion on balancing performance and practical data constraints in the fast-evolving field of deep learning.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [41] Enhancing the transferability of adversarial examples on vision transformers
    Guan, Yujiao
    Yang, Haoyu
    Qu, Xiaotong
    Wang, Xiaodong
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (02)
  • [42] Evolutionary deep learning for computer vision and image processing
    Al-Sahaf, Harith
    Mesejo, Pablo
    Bi, Ying
    Zhang, Mengjie
    APPLIED SOFT COMPUTING, 2024, 151
  • [43] Deep learning-enabled medical computer vision
    Esteva, Andre
    Chou, Katherine
    Yeung, Serena
    Naik, Nikhil
    Madani, Ali
    Mottaghi, Ali
    Liu, Yun
    Topol, Eric
    Dean, Jeff
    Socher, Richard
    NPJ DIGITAL MEDICINE, 2021, 4 (01)
  • [44] Computer vision with deep learning for ship draft reading
    Wang, Bangping
    Liu, Zhiming
    Wang, Haoran
    OPTICAL ENGINEERING, 2021, 60 (02)
  • [45] Deep Learning vs. Traditional Computer Vision
    O'Mahony, Niall
    Campbell, Sean
    Carvalho, Anderson
    Harapanahalli, Suman
    Hernandez, Gustavo Velasco
    Krpalkova, Lenka
    Riordan, Daniel
    Walsh, Joseph
    ADVANCES IN COMPUTER VISION, CVC, VOL 1, 2020, 943 : 128 - 144
  • [46] Advances in solar forecasting: Computer vision with deep learning
    Paletta, Quentin
    Terren-Serrano, Guillermo
    Nie, Yuhao
    Li, Binghui
    Bieker, Jacob
    Zhang, Wenqi
    Dubus, Laurent
    Dev, Soumyabrata
    Feng, Cong
    ADVANCES IN APPLIED ENERGY, 2023, 11
  • [47] Application of Deep Learning to Computer Vision: A Comprehensive Study
    Islam, S. M. Sofiqul
    Rahman, Shanto
    Rahman, Md. Mostafijur
    Dey, Emon Kumar
    Shoyaib, Mohammad
    2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 592 - 597
  • [48] Deep learning-enabled medical computer vision
    Andre Esteva
    Katherine Chou
    Serena Yeung
    Nikhil Naik
    Ali Madani
    Ali Mottaghi
    Yun Liu
    Eric Topol
    Jeff Dean
    Richard Socher
    npj Digital Medicine, 4
  • [49] Deep reinforcement learning in computer vision: a comprehensive survey
    Le, Ngan
    Rathour, Vidhiwar Singh
    Yamazaki, Kashu
    Luu, Khoa
    Savvides, Marios
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 2733 - 2819
  • [50] Improving landslide prediction by computer vision and deep learning
    Guerrero-Rodriguez, Byron
    Garcia-Rodriguez, Jose
    Salvador, Jaime
    Mejia-Escobar, Christian
    Cadena, Shirley
    Cepeda, Jairo
    Benavent-Lledo, Manuel
    Mulero-Perez, David
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2024, 31 (01) : 77 - 94