V-DETR: Pure Transformer for End-to-End Object Detection

被引:0
|
作者
Dung Nguyen [1 ]
Van-Dung Hoang [2 ]
Van-Tuong-Lan Le [3 ]
机构
[1] Hue Univ, Hue Univ Sci, Hue City 530000, Vietnam
[2] HCMC Univ Technol & Educ, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam
[3] Hue Univ, Dept Acad & Students Affairs, Hue City 530000, Vietnam
关键词
Computer Vision; Object Detection; Classification; Convolutional Neural Networks; Deep Learning; DETR; ViT;
D O I
10.1007/978-981-97-4985-0_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of computer vision, the task of object detection is one of the important tasks, with many challenges and practical applications. Its task is to classify and determine the location of objects in images or videos. Many machine learning methods, especially deep learning, have been developed to perform this task. This article introduces a model combining DETR (DEtection TRansfomer) and ViT (Vision Transformer) as a method to recognize objects in images/videos that only use components of the Transformer model. The DETR model achieves good results in object detection using the Transformer architecture and without the need for complex intermediate steps. The ViT model, a Transformer-based architecture, has brought about a breakthrough in image classification. Combining both architectures opens exciting prospects in computer vision. The input image automatically extracted features using the ViT model previously trained on the ImageNet21K dataset, then the features will be fed into the Transformer model to find the classification and bounding box of the objects. Experimental results on test data sets showthat this combined model has better ability in object recognition than DETR and ViT alone. This brings important prospects for the application of the Transformer model not only in the field of natural language processing but also in the field of image classification and object detection. The results of our proposed model have quite high mAP@0.5= 0.444 accuracy, slightly better than the original DETR model. The code is available at https://github.com/nguyendung622/vitdetr.
引用
收藏
页码:120 / 131
页数:12
相关论文
共 50 条
  • [31] Intrinsic Explainability for End-to-End Object Detection
    Fernandes, Luis
    Fernandes, Joao N. D.
    Calado, Mariana
    Pinto, Joao Ribeiro
    Cerqueira, Ricardo
    Cardoso, Jaime S.
    IEEE ACCESS, 2024, 12 : 2623 - 2634
  • [32] What Makes for End-to-End Object Detection?
    Sun, Peize
    Jiang, Yi
    Xie, Enze
    Shao, Wenqi
    Yuan, Zehuan
    Wang, Changhu
    Luo, Ping
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [33] End-To-End High-Quality Transformer Object Detection Model Applied to Human Head Detection
    Zhou, Zhen
    Li, Rongchun
    Qiao, Peng
    Jiang, Jingfei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT XII, 2025, 15042 : 404 - 417
  • [34] Casting-DETR: An End-to-End Network for Casting Surface Defect Detection
    Pu, Quan-cheng
    Hui, Zhang
    Xu, Xiang-rong
    Zhang, Long
    Gao, Ju
    Rodic, Aleksandar
    Petrovic, Petar B.
    Wang, Hai-yan
    Xu, Shan-shan
    Wang, Zhi-xiong
    INTERNATIONAL JOURNAL OF METALCASTING, 2024, 18 (04) : 3152 - 3165
  • [35] MT-DETR: Robust End-to-end Multimodal Detection with Confidence Fusion
    Chu, Shih-Yun
    Lee, Ming-Sui
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5241 - 5250
  • [36] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
    Wu, Minglin
    Li, Kun
    Leung, Wai-Kim
    Meng, Helen
    INTERSPEECH 2021, 2021, : 3954 - 3958
  • [37] MGTR: End-to-End Mutual Gaze Detection with Transformer
    Guo, Hang
    Hu, Zhengxi
    Liu, Jingtai
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 363 - 378
  • [38] MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
    Kim, Bumsoo
    Mun, Jonghwan
    On, Kyoung-Woon
    Shin, Minchul
    Lee, Junhyun
    Kim, Eun-Sol
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19556 - 19565
  • [39] DIMD-DETR: DDQ-DETR With Improved Metric Space for End-to-End Object Detector on Remote Sensing Aircrafts
    Liu, Huan
    Ren, Xuefeng
    Gan, Yang
    Chen, Yongming
    Lin, Ping
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 4498 - 4509
  • [40] DeoT: an end-to-end encoder-only Transformer object detector
    Ding, Tonghe
    Feng, Kaili
    Wei, Yanjun
    Han, Yu
    Li, Tianping
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2023, 20 (01)