V-DETR: Pure Transformer for End-to-End Object Detection

被引:0
|
作者
Dung Nguyen [1 ]
Van-Dung Hoang [2 ]
Van-Tuong-Lan Le [3 ]
机构
[1] Hue Univ, Hue Univ Sci, Hue City 530000, Vietnam
[2] HCMC Univ Technol & Educ, Fac Informat Technol, Ho Chi Minh City 720000, Vietnam
[3] Hue Univ, Dept Acad & Students Affairs, Hue City 530000, Vietnam
关键词
Computer Vision; Object Detection; Classification; Convolutional Neural Networks; Deep Learning; DETR; ViT;
D O I
10.1007/978-981-97-4985-0_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of computer vision, the task of object detection is one of the important tasks, with many challenges and practical applications. Its task is to classify and determine the location of objects in images or videos. Many machine learning methods, especially deep learning, have been developed to perform this task. This article introduces a model combining DETR (DEtection TRansfomer) and ViT (Vision Transformer) as a method to recognize objects in images/videos that only use components of the Transformer model. The DETR model achieves good results in object detection using the Transformer architecture and without the need for complex intermediate steps. The ViT model, a Transformer-based architecture, has brought about a breakthrough in image classification. Combining both architectures opens exciting prospects in computer vision. The input image automatically extracted features using the ViT model previously trained on the ImageNet21K dataset, then the features will be fed into the Transformer model to find the classification and bounding box of the objects. Experimental results on test data sets showthat this combined model has better ability in object recognition than DETR and ViT alone. This brings important prospects for the application of the Transformer model not only in the field of natural language processing but also in the field of image classification and object detection. The results of our proposed model have quite high mAP@0.5= 0.444 accuracy, slightly better than the original DETR model. The code is available at https://github.com/nguyendung622/vitdetr.
引用
收藏
页码:120 / 131
页数:12
相关论文
共 50 条
  • [41] DeoT: an end-to-end encoder-only Transformer object detector
    Tonghe Ding
    Kaili Feng
    Yanjun Wei
    Yu Han
    Tianping Li
    Journal of Real-Time Image Processing, 2023, 20
  • [42] End-to-End Object Detection with Fully Convolutional Network
    Wang, Jianfeng
    Song, Lin
    Li, Zeming
    Sun, Hongbin
    Sun, Jian
    Zheng, Nanning
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15844 - 15853
  • [43] Progressive End-to-End Object Detection in Crowded Scenes
    Zheng, Anlin
    Zhang, Yuang
    Zhang, Xiangyu
    Qi, Xiaojuan
    Sun, Jian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 847 - 856
  • [44] Toward End-to-End Object Detection and Tracking on the Edge
    Tabkhi, Hamed
    SEC 2017: 2017 THE SECOND ACM/IEEE SYMPOSIUM ON EDGE COMPUTING (SEC'17), 2017,
  • [45] Dense Distinct Query for End-to-End Object Detection
    Zhang, Shilong
    Wang, Xinjiang
    Wang, Jiaqi
    Pang, Jiangmiao
    Lyu, Chengqi
    Zhang, Wenwei
    Luo, Ping
    Chen, Kai
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7329 - 7338
  • [46] End-to-End Edge Neuromorphic Object Detection System
    Silva, D. A.
    Shymyrbay, A.
    Smagulova, K.
    Elsheikh, A.
    Fouda, M. E.
    Eltawil, A. M.
    2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 194 - 198
  • [47] DCEA: DETR With Concentrated Deformable Attention for End-to-End Ship Detection in SAR Images
    Lin, Hai
    Liu, Jin
    Li, Xingye
    Wei, Lai
    Liu, Yuxin
    Han, Bing
    Wu, Zhongdai
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 17292 - 17307
  • [48] Sparse Block DETR: Precise and Speedy End-to-End Detector for PCB Defect Detection
    Hong, JiXuan
    Xie, JingJing
    Yang, ChenHui
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT I, 2023, 14254 : 281 - 292
  • [49] End-to-end pest detection on an improved deformable DETR with multihead criss cross attention
    Qi, Fang
    Chen, Gangming
    Liu, Jieyuan
    Tang, Zhe
    ECOLOGICAL INFORMATICS, 2022, 72
  • [50] NucDETR: End-to-End Transformer for Nucleus Detection in Histopathology Images
    Obeid, Ahmad
    Mahbub, Taslim
    Javed, Sajid
    Dias, Jorge
    Werghi, Naoufel
    COMPUTATIONAL MATHEMATICS MODELING IN CANCER ANALYSIS, CMMCA 2022, 2022, 13574 : 47 - 57