Vehicle Classification Algorithm Based on Improved Vision Transformer

被引：1

作者：

Dong, Xinlong ^{[1
]}

Shi, Peicheng ^{[1
]}

Tang, Yueyue ^{[1
]}

Yang, Li ^{[1
]}

Yang, Aixi ^{[2
]}

Liang, Taonian ^{[3
]}

机构：

[1] Anhui Polytech Univ, Sch Mech & Automot Engn, Wuhu 241000, Peoples R China

[2] Zhejiang Univ, Polytech Inst, Hangzhou 310015, Peoples R China

[3] Chery New Energy Automobile Co Ltd, Wuhu 241000, Peoples R China

来源：

WORLD ELECTRIC VEHICLE JOURNAL | 2024年 / 15卷 / 08期

关键词：

vehicle classification; vision transformer; local detail features; sparse attention module; contrast loss;

D O I：

10.3390/wevj15080344

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Vehicle classification technology is one of the foundations in the field of automatic driving. With the development of deep learning technology, visual transformer structures based on attention mechanisms can represent global information quickly and effectively. However, due to direct image segmentation, local feature details and information will be lost. To solve this problem, we propose an improved vision transformer vehicle classification network (IND-ViT). Specifically, we first design a CNN-In D branch module to extract local features before image segmentation to make up for the loss of detail information in the vision transformer. Then, in order to solve the problem of misdetection caused by the large similarity of some vehicles, we propose a sparse attention module, which can screen out the discernible regions in the image and further improve the detailed feature representation ability of the model. Finally, this paper uses the contrast loss function to further increase the intra-class consistency and inter-class difference of classification features and improve the accuracy of vehicle classification recognition. Experimental results show that the accuracy of the proposed model on the datasets of vehicle classification BIT-Vehicles, CIFAR-10, Oxford Flower-102, and Caltech-101 is higher than that of the original vision transformer model. Respectively, it increased by 1.3%, 1.21%, 7.54%, and 3.60%; at the same time, it also met a certain real-time requirement to achieve a balance of accuracy and real time.

引用

页数：18

共 50 条

[31] The Application of Vision Transformer in Image Classification
He, Zhixuan
2022 THE 6TH INTERNATIONAL CONFERENCE ON VIRTUAL AND AUGMENTED REALITY SIMULATIONS, ICVARS 2022, 2022, : 56 - 63
[32] Visual simultaneous localization and mapping (vSLAM) algorithm based on improved Vision Transformer semantic segmentation in dynamic scenes
Chen, Mengyuan
Guo, Hangrong
Qian, Runbang
Gong, Guangqiang
Cheng, Hao
MECHANICAL SCIENCES, 2024, 15 (01) : 1 - 16
[33] Vision Transformer for femur fracture classification
Tanzi, Leonardo
Audisio, Andrea
Cirrincione, Giansalvo
Aprato, Alessandro
Vezzetti, Enrico
INJURY-INTERNATIONAL JOURNAL OF THE CARE OF THE INJURED, 2022, 53 (07): : 2625 - 2634
[34] An ensemble learning integration of multiple CNN with improved vision transformer models for pest classification
Xia, Wanshang
Han, Dezhi
Li, Dun
Wu, Zhongdai
Han, Bing
Wang, Junxiang
ANNALS OF APPLIED BIOLOGY, 2023, 182 (02) : 144 - 158
[35] DENSEVIT: A HYBRID CNN-VISION TRANSFORMER MODEL FOR AN IMPROVED MULTISENSOR LITHOLOGICAL CLASSIFICATION
Appiah-Twum, Michael
Xu, Wenbo
Acheampong, Edward Mensah
IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 3418 - 3422
[36] A Video Vehicle Detection Algorithm Based on Improved Adaboost Algorithm
Liu, Weiguang
Zhang, Qian
PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MANAGEMENT, EDUCATION, INFORMATION AND CONTROL (MEICI 2016), 2016, 135 : 544 - 548
[37] Classification of Mobile-Based Oral Cancer Images Using the Vision Transformer and the Swin Transformer
Song, Bofan
Raj, Dharma K. C.
Yang, Rubin Yuchan
Li, Shaobai
Zhang, Chicheng
Liang, Rongguang
CANCERS, 2024, 16 (05)
[38] Heart sound classification based on bispectrum features and Vision Transformer mode
Liu, Zeye
Jiang, Hong
Zhang, Fengwen
Ouyang, Wenbin
Li, Xiaofei
Pan, Xiangbin
ALEXANDRIA ENGINEERING JOURNAL, 2023, 85 : 49 - 59
[39] HaViT: Hybrid-Attention Based Vision Transformer for Video Classification
Li, Li
Zhuang, Liansheng
Gao, Shenghua
Wang, Shafei
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 502 - 517
[40] Encrypted traffic classification based on fusion of vision transformer and temporal features
Wang Lanting
Hu Wei
Liu Jianyi
Pang Jin
Gao Yating
Xue Jingyao
Zhang Jie
The Journal of China Universities of Posts and Telecommunications, 2023, 30 (02) : 73 - 82

← 1 2 3 4 5 →