V-DETR: Pure Transformer for End-to-End Object Detection

被引:0
|
作者
Dung Nguyen [1 ]
Van-Dung Hoang [2 ]
Van-Tuong-Lan Le [3 ]
机构
[1] Hue Univ, Hue Univ Sci, Hue City 530000, Vietnam
[2] HCMC Univ Technol & Educ, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam
[3] Hue Univ, Dept Acad & Students Affairs, Hue City 530000, Vietnam
关键词
Computer Vision; Object Detection; Classification; Convolutional Neural Networks; Deep Learning; DETR; ViT;
D O I
10.1007/978-981-97-4985-0_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of computer vision, the task of object detection is one of the important tasks, with many challenges and practical applications. Its task is to classify and determine the location of objects in images or videos. Many machine learning methods, especially deep learning, have been developed to perform this task. This article introduces a model combining DETR (DEtection TRansfomer) and ViT (Vision Transformer) as a method to recognize objects in images/videos that only use components of the Transformer model. The DETR model achieves good results in object detection using the Transformer architecture and without the need for complex intermediate steps. The ViT model, a Transformer-based architecture, has brought about a breakthrough in image classification. Combining both architectures opens exciting prospects in computer vision. The input image automatically extracted features using the ViT model previously trained on the ImageNet21K dataset, then the features will be fed into the Transformer model to find the classification and bounding box of the objects. Experimental results on test data sets showthat this combined model has better ability in object recognition than DETR and ViT alone. This brings important prospects for the application of the Transformer model not only in the field of natural language processing but also in the field of image classification and object detection. The results of our proposed model have quite high mAP@0.5= 0.444 accuracy, slightly better than the original DETR model. The code is available at https://github.com/nguyendung622/vitdetr.
引用
收藏
页码:120 / 131
页数:12
相关论文
共 50 条
  • [1] Dynamic DETR: End-to-End Object Detection with Dynamic Attention
    Dai, Xiyang
    Chen, Yinpeng
    Yang, Jianwei
    Zhang, Pengchuan
    Yuan, Lu
    Zhang, Lei
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2968 - 2977
  • [2] CPH DETR: Comprehensive Regression Loss for End-to-End Object Detection
    Wu, Jihao
    Li, Shufang
    Kang, Guxia
    Yang, Yuqing
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 93 - 107
  • [3] SRDD: a lightweight end-to-end object detection with transformer
    Zhu, Yuan
    Xia, Qingyuan
    Jin, Wen
    CONNECTION SCIENCE, 2022, 34 (01) : 2448 - 2465
  • [4] Pruning DETR: efficient end-to-end object detection with sparse structured pruning
    Huaiyuan Sun
    Shuili Zhang
    Xve Tian
    Yuanyuan Zou
    Signal, Image and Video Processing, 2024, 18 : 129 - 135
  • [5] Pruning DETR: efficient end-to-end object detection with sparse structured pruning
    Sun, Huaiyuan
    Zhang, Shuili
    Tian, Xve
    Zou, Yuanyuan
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 129 - 135
  • [6] FSH-DETR: An Efficient End-to-End Fire Smoke and Human Detection Based on a Deformable DEtection TRansformer (DETR)
    Liang, Tianyu
    Zeng, Guigen
    SENSORS, 2024, 24 (13)
  • [7] End-to-End Human Object Interaction Detection with HOI Transformer
    Zou, Cheng
    Wang, Bohan
    Hu, Yue
    Liu, Junqi
    Wu, Qian
    Zhao, Yu
    Li, Boxun
    Zhang, Chenguang
    Zhang, Chi
    Wei, Yichen
    Sun, Jian
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11820 - 11829
  • [8] Decoupled DETR: Spatially Disentangling Localization and Classification for Improved End-to-End Object Detection
    Zhang, Manyuan
    Song, Guanglu
    Liu, Yu
    Li, Hongsheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6578 - 6587
  • [9] CF-DETR: Coarse-to-Fine Transformers for End-to-End Object Detection
    Cao, Xipeng
    Yuan, Peng
    Feng, Bailan
    Niu, Kun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 185 - 193
  • [10] Rotated-DETR: an End-to-End Transformer-based Oriented Object Detector for Aerial Images
    Kim, Jinbeom
    Lee, Giljun
    Kim, Taejune
    Woo, Simon S.
    38TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2023, 2023, : 1248 - 1255