Enhancing Computer Vision Performance: A Hybrid Deep Learning Approach with CNNs and Vision Transformers

被引:1
|
作者
Sardar, Abha Singh [1 ]
Ranjan, Vivek [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Dept Comp Sci & Engn, Bhopal, Madhya Pradesh, India
来源
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT II | 2024年 / 2010卷
关键词
Convolutional Neural Networks (CNNs); Vision Transformers (ViTs); Image classification; Plant Disease; Limited data;
D O I
10.1007/978-3-031-58174-8_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article explores the growing prominence of deep learning algorithms in computer vision tasks, focusing on the strengths and weaknesses of Convolutional Neural Networks and Vision Transformers (ViTs). Convolutional Neural Network (CNNs) have dominated computer vision tasks since their inception due to their ability to identify features irrespective of their location, scale, or orientation. However, their efficiency is limited, particularly in managing long-range dependencies. Conversely, Vision Transformers (ViTs), while high performing, are "data-hungry" and require substantial training data to reach their full potential, posing a significant obstacle in areas with limited data availability such as healthcare and plant pathology. To address these limitations, we propose a hybrid approach that integrates the strengths of both CNNs and ViTs, aiming to create a robust model that is efficient with a range of data sizes. Testing on the Plant Disease and Tomato Leaf Disease Classification datasets demonstrates the efficacy of our model, with a marked improvement in F1 score, accuracy, and a significant reduction in loss compared to the base CNN. These findings demonstrate the potential of the suggested method in identifying plant diseases, making a significant contribution to advancements in agricultural technology. This research initiates a crucial discussion on balancing performance and practical data constraints in the fast-evolving field of deep learning.
引用
收藏
页码:591 / 602
页数:12
相关论文
共 50 条
  • [31] Special focus on deep learning for computer vision
    Yanwei PANG
    Xiang BAI
    Guofeng ZHANG
    Science China(Information Sciences), 2019, 62 (12) : 5 - 5
  • [32] Hyperbolic Deep Learning in Computer Vision: A Survey
    Mettes, Pascal
    Atigh, Mina Ghadimi
    Keller-Ressel, Martin
    Gu, Jeffrey
    Yeung, Serena
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3484 - 3508
  • [33] Tensor Methods in Computer Vision and Deep Learning
    Panagakis, Yannis
    Kossaifi, Jean
    Chrysos, Grigorios G.
    Oldfield, James
    Nicolaou, Mihalis A.
    Anandkumar, Anima
    Zafeiriou, Stefanos
    PROCEEDINGS OF THE IEEE, 2021, 109 (05) : 863 - 890
  • [34] Hybrid Electric Vehicle Energy Management With Computer Vision and Deep Reinforcement Learning
    Wang, Yong
    Tan, Huachun
    Wu, Yuankai
    Peng, Jiankun
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (06) : 3857 - 3868
  • [35] Vision Transformers in medical computer vision-A contemplative retrospection
    Parvaiz, Arshi
    Khalid, Muhammad Anwaar
    Zafar, Rukhsana
    Ameer, Huma
    Ali, Muhammad
    Fraz, Muhammad Moazam
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [36] Distilling efficient Vision Transformers from CNNs for semantic segmentation
    Zheng, Xu
    Luo, Yunhao
    Zhou, Pengyuan
    Wang, Lin
    PATTERN RECOGNITION, 2025, 158
  • [37] Enhancing Fish Auction with Deep Learning and Computer Vision: Automated Caliber and Species Classification
    Jareno, Javier
    Barcena-Gonzalez, Guillermo
    Castro-Gutierrez, Jairo
    Cabrera-Castro, Remedios
    Galindo, Pedro L.
    FISHES, 2024, 9 (04)
  • [38] A Comprehensive Survey of Transformers for Computer Vision
    Jamil, Sonain
    Piran, Md. Jalil
    Kwon, Oh-Jin
    DRONES, 2023, 7 (05)
  • [39] Enhancing the Performance of the Photovoltaic Cells Employing Computer Vision
    Baniamerian, Amir
    Bostani, Ali
    2023 IEEE INTERNATIONAL WORKSHOP ON METROLOGY FOR LIVING ENVIRONMENT, METROLIVENV, 2023, : 91 - 95