Vehicle Classification Algorithm Based on Improved Vision Transformer

被引:1
|
作者
Dong, Xinlong [1 ]
Shi, Peicheng [1 ]
Tang, Yueyue [1 ]
Yang, Li [1 ]
Yang, Aixi [2 ]
Liang, Taonian [3 ]
机构
[1] Anhui Polytech Univ, Sch Mech & Automot Engn, Wuhu 241000, Peoples R China
[2] Zhejiang Univ, Polytech Inst, Hangzhou 310015, Peoples R China
[3] Chery New Energy Automobile Co Ltd, Wuhu 241000, Peoples R China
来源
WORLD ELECTRIC VEHICLE JOURNAL | 2024年 / 15卷 / 08期
关键词
vehicle classification; vision transformer; local detail features; sparse attention module; contrast loss;
D O I
10.3390/wevj15080344
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Vehicle classification technology is one of the foundations in the field of automatic driving. With the development of deep learning technology, visual transformer structures based on attention mechanisms can represent global information quickly and effectively. However, due to direct image segmentation, local feature details and information will be lost. To solve this problem, we propose an improved vision transformer vehicle classification network (IND-ViT). Specifically, we first design a CNN-In D branch module to extract local features before image segmentation to make up for the loss of detail information in the vision transformer. Then, in order to solve the problem of misdetection caused by the large similarity of some vehicles, we propose a sparse attention module, which can screen out the discernible regions in the image and further improve the detailed feature representation ability of the model. Finally, this paper uses the contrast loss function to further increase the intra-class consistency and inter-class difference of classification features and improve the accuracy of vehicle classification recognition. Experimental results show that the accuracy of the proposed model on the datasets of vehicle classification BIT-Vehicles, CIFAR-10, Oxford Flower-102, and Caltech-101 is higher than that of the original vision transformer model. Respectively, it increased by 1.3%, 1.21%, 7.54%, and 3.60%; at the same time, it also met a certain real-time requirement to achieve a balance of accuracy and real time.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] The Application of Vision Transformer in Image Classification
    He, Zhixuan
    2022 THE 6TH INTERNATIONAL CONFERENCE ON VIRTUAL AND AUGMENTED REALITY SIMULATIONS, ICVARS 2022, 2022, : 56 - 63
  • [32] Visual simultaneous localization and mapping (vSLAM) algorithm based on improved Vision Transformer semantic segmentation in dynamic scenes
    Chen, Mengyuan
    Guo, Hangrong
    Qian, Runbang
    Gong, Guangqiang
    Cheng, Hao
    MECHANICAL SCIENCES, 2024, 15 (01) : 1 - 16
  • [33] Vision Transformer for femur fracture classification
    Tanzi, Leonardo
    Audisio, Andrea
    Cirrincione, Giansalvo
    Aprato, Alessandro
    Vezzetti, Enrico
    INJURY-INTERNATIONAL JOURNAL OF THE CARE OF THE INJURED, 2022, 53 (07): : 2625 - 2634
  • [34] An ensemble learning integration of multiple CNN with improved vision transformer models for pest classification
    Xia, Wanshang
    Han, Dezhi
    Li, Dun
    Wu, Zhongdai
    Han, Bing
    Wang, Junxiang
    ANNALS OF APPLIED BIOLOGY, 2023, 182 (02) : 144 - 158
  • [35] DENSEVIT: A HYBRID CNN-VISION TRANSFORMER MODEL FOR AN IMPROVED MULTISENSOR LITHOLOGICAL CLASSIFICATION
    Appiah-Twum, Michael
    Xu, Wenbo
    Acheampong, Edward Mensah
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 3418 - 3422
  • [36] A Video Vehicle Detection Algorithm Based on Improved Adaboost Algorithm
    Liu, Weiguang
    Zhang, Qian
    PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MANAGEMENT, EDUCATION, INFORMATION AND CONTROL (MEICI 2016), 2016, 135 : 544 - 548
  • [37] Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer
    Song, Bofan
    Raj, Dharma K. C.
    Yang, Rubin Yuchan
    Li, Shaobai
    Zhang, Chicheng
    Liang, Rongguang
    CANCERS, 2024, 16 (05)
  • [38] Heart sound classification based on bispectrum features and Vision Transformer mode
    Liu, Zeye
    Jiang, Hong
    Zhang, Fengwen
    Ouyang, Wenbin
    Li, Xiaofei
    Pan, Xiangbin
    ALEXANDRIA ENGINEERING JOURNAL, 2023, 85 : 49 - 59
  • [39] HaViT: Hybrid-Attention Based Vision Transformer for Video Classification
    Li, Li
    Zhuang, Liansheng
    Gao, Shenghua
    Wang, Shafei
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 502 - 517
  • [40] Encrypted traffic classification based on fusion of vision transformer and temporal features
    Wang Lanting
    Hu Wei
    Liu Jianyi
    Pang Jin
    Gao Yating
    Xue Jingyao
    Zhang Jie
    The Journal of China Universities of Posts and Telecommunications, 2023, 30 (02) : 73 - 82