Lightweight CNN-ViT with cross-module representational constraint for express parcel detectionLightweight CNN-ViT with cross-module representational constraint for express parcel detectionG. Zhang et al.

被引:0
|
作者
Guowei Zhang [1 ]
Wuzhi Li [1 ]
Yutong Tang [1 ]
Shuixuan Chen [1 ]
Li Wang [2 ]
机构
[1] Xiamen University of Technology,School of Mechanical and Automotive Engineering
[2] Shunfeng Technology Co.,Research and Development Department
[3] Ltd.,undefined
关键词
Edge devices; Hybrid model; Transformers; Convolutional neural network; Object detection;
D O I
10.1007/s00371-024-03602-0
中图分类号
学科分类号
摘要
The express parcel(EP) detection model needs to be deployed on edge devices with limited computing capabilities, hence a lightweight and efficient object detection model is essential. In this work, we introduce a novel lightweight CNN-ViT with cross-module representational constraint designed specifically for EP detection—CMViT. In CMViT, we draw on the concept of cross-attention from multimodal models and propose a new cross-module attention(CMA) encoder. Local features are provided by the proposed lightweight shuffle block(LSBlock), and CMA encoder flexibly connects local and global features from the hybrid CNN-ViT model through self-attention, constructing a robust dependency between local and global features, thereby effectively enhancing the model’s receptive field. Furthermore, LSBlock provides effective guidance and constraints for CMA encoder, avoiding unnecessary attention to redundant information and reducing computational cost. In EP detection, compared to YOLOv8s, CMViT achieves 99% mean accuracy with a 25% input resolution, 54.5% of the parameters, and 14.7% of the FLOPs, showing superior performance and promising applications. In more challenging object detection tasks, CMViT exhibits exceptional performance, achieving 28.8 mAP and 2.2G MAdds on COCO dataset, thus outperforming MobileViT by 4% in accuracy while consuming less computational power. Code is available at: https://github.com/Acc2386/CMViT.
引用
收藏
页码:3283 / 3295
页数:12
相关论文
共 1 条
  • [1] Lightweight CNN-ViT with cross-module representational constraint for express parcel detection
    Zhang, Guowei
    Li, Wuzhi
    Tang, Yutong
    Chen, Shuixuan
    Wang, Li
    VISUAL COMPUTER, 2024,