V-DETR: Pure Transformer for End-to-End Object Detection

被引:0
|
作者
Dung Nguyen [1 ]
Van-Dung Hoang [2 ]
Van-Tuong-Lan Le [3 ]
机构
[1] Hue Univ, Hue Univ Sci, Hue City 530000, Vietnam
[2] HCMC Univ Technol & Educ, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam
[3] Hue Univ, Dept Acad & Students Affairs, Hue City 530000, Vietnam
关键词
Computer Vision; Object Detection; Classification; Convolutional Neural Networks; Deep Learning; DETR; ViT;
D O I
10.1007/978-981-97-4985-0_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of computer vision, the task of object detection is one of the important tasks, with many challenges and practical applications. Its task is to classify and determine the location of objects in images or videos. Many machine learning methods, especially deep learning, have been developed to perform this task. This article introduces a model combining DETR (DEtection TRansfomer) and ViT (Vision Transformer) as a method to recognize objects in images/videos that only use components of the Transformer model. The DETR model achieves good results in object detection using the Transformer architecture and without the need for complex intermediate steps. The ViT model, a Transformer-based architecture, has brought about a breakthrough in image classification. Combining both architectures opens exciting prospects in computer vision. The input image automatically extracted features using the ViT model previously trained on the ImageNet21K dataset, then the features will be fed into the Transformer model to find the classification and bounding box of the objects. Experimental results on test data sets showthat this combined model has better ability in object recognition than DETR and ViT alone. This brings important prospects for the application of the Transformer model not only in the field of natural language processing but also in the field of image classification and object detection. The results of our proposed model have quite high mAP@0.5= 0.444 accuracy, slightly better than the original DETR model. The code is available at https://github.com/nguyendung622/vitdetr.
引用
收藏
页码:120 / 131
页数:12
相关论文
共 50 条
  • [21] End-to-End Detection for Key Equipment in Natural Gas Station with DETR
    Liang, Xinyue
    Su, Huai
    Zhang, Jinjun
    He, Yuxuan
    Qin, Xiaodong
    Yang, Zhaoming
    ADVANCES IN CLEAN AND GREEN ENERGY SOLUTIONS: ICCGE 2024 PROCEEDINGS, 2025, 1333 : 43 - 54
  • [22] DITA: DETR with improved queries for end-to-end temporal action detection
    Lu, Chongkai
    Mak, Man-Wai
    NEUROCOMPUTING, 2024, 596
  • [23] End-to-end lane detection with convolution and transformer
    Ge, Zekun
    Ma, Chao
    Fu, Zhumu
    Song, Shuzhong
    Si, Pengju
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 29607 - 29627
  • [24] End-to-End Temporal Action Detection With Transformer
    Liu, Xiaolong
    Wang, Qimeng
    Hu, Yao
    Tang, Xu
    Zhang, Shiwei
    Bai, Song
    Bai, Xiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5427 - 5441
  • [25] End-to-end lane detection with convolution and transformer
    Zekun Ge
    Chao Ma
    Zhumu Fu
    Shuzhong Song
    Pengju Si
    Multimedia Tools and Applications, 2023, 82 : 29607 - 29627
  • [26] AdIn-DETR: Adapting Detection Transformer for End-to-End Real-Time Power Line Insulator Defect Detection
    Cheng, Yang
    Liu, Daming
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [27] TOD-Net: An end-to-end transformer-based object detection network
    Sirisha, Museboyina
    Sudha, S. V.
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 108
  • [28] Enhanced Sparse Detection for End-to-End Object Detection
    Liao, Yongwei
    Chen, Gang
    Xu, Runnan
    IEEE ACCESS, 2022, 10 : 85630 - 85640
  • [29] MOTR: End-to-End Multiple-Object Tracking with Transformer
    Zeng, Fangao
    Dong, Bin
    Zhang, Yuang
    Wang, Tiancai
    Zhang, Xiangyu
    Wei, Yichen
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 659 - 675
  • [30] EOOD: End-to-end oriented object detection
    Zhang, Caiguang
    Chen, Zilong
    Xiong, Boli
    Ji, Kefeng
    Kuang, Gangyao
    NEUROCOMPUTING, 2025, 621