Transformers only look once with nonlinear combination for real-time object detection

被引:0
|
作者
Ruiyang Xia
Guoquan Li
Zhengwen Huang
Yu Pang
Man Qi
机构
[1] Chongqing University of Posts and Telecommunications,School of Communication and Information Engineering
[2] BUL-CQUPT Innovation Institute,Group of Artificial Intelligence and System Optimization
[3] Brunel University London,Department of Electronic and Electrical Engineering
[4] Chongqing University of Posts and Telecommunications,Key Laboratory of Photoelectric Information Sensing and Transmission Technology
[5] Canterbury Christ Church University,School of Engineering
[6] England Kent,undefined
来源
关键词
Real-time object detector; TOLO; Vision transformer networks; Nonlinear combination;
D O I
暂无
中图分类号
学科分类号
摘要
In this article, a novel real-time object detector called Transformers Only Look Once (TOLO) is proposed to resolve two problems. The first problem is the inefficiency of building long-distance dependencies among local features for amounts of modern real-time object detectors. The second one is the lack of inductive biases for vision Transformer networks with heavily computational cost. TOLO is composed of Convolutional Neural Network (CNN) backbone, Feature Fusion Neck (FFN), and different Lite Transformer Heads (LTHs), which are used to transfer the inductive biases, supply the extracted features with high-resolution and high-semantic properties, and efficiently mine multiple long-distance dependencies with less memory overhead for detection, respectively. Moreover, to find the massive potential correct boxes during prediction, we propose a simple and efficient nonlinear combination method between the object confidence and the classification score. Experiments on the PASCAL VOC 2007, 2012, and the MS COCO 2017 datasets demonstrate that TOLO significantly outperforms other state-of-the-art methods with a small input size. Besides, the proposed nonlinear combination method can further elevate the detection performance of TOLO by boosting the results of potential correct predicted boxes without increasing the training process and model parameters.
引用
收藏
页码:12571 / 12585
页数:14
相关论文
共 50 条
  • [1] Transformers only look once with nonlinear combination for real-time object detection
    Xia, Ruiyang
    Li, Guoquan
    Huang, Zhengwen
    Pang, Yu
    Qi, Man
    [J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (15): : 12571 - 12585
  • [2] You Only Look Once: Unified, Real-Time Object Detection
    Redmon, Joseph
    Divvala, Santosh
    Girshick, Ross
    Farhadi, Ali
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 779 - 788
  • [3] Polyp Recognition and Localization with You-Only-Look-Once (YOLO) Real-Time Object Detection System
    Li, Weiquan James
    Ang, Tiing Leong
    Chong, Dewei
    Chia, Tiongsun
    Fock, Kwong Ming
    [J]. DIGESTION, 2021, 102 (01) : 101 - 101
  • [4] YOLOH: You Only Look One Hourglass for Real-Time Object Detection
    Wang, Shaobo
    Chen, Renhai
    Wu, Hongyue
    Li, Xiaozhe
    Feng, Zhiyong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2104 - 2115
  • [5] Research on the Real-Time Detection of Red Fruit Based on the You Only Look Once Algorithm
    Mei, Song
    Ding, Wenqin
    Wang, Jinpeng
    [J]. PROCESSES, 2024, 12 (01)
  • [6] Making You Only Look Once Faster: Toward Real-Time Intelligent Transportation Detection
    Dai, Yuan
    Liu, Weiming
    Xie, Wei
    Liu, Ruikang
    Zheng, Zhongxing
    Long, Kejun
    Wang, Liang
    Mao, Liang
    Qiu, Qisheng
    Ling, Guangzheng
    [J]. IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2023, 15 (03) : 8 - 25
  • [8] You Only Look at Once for Real-Time and Generic Multi-Task
    Wang, Jiayuan
    Wu, Q. M. Jonathan
    Zhang, Ning
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2024, 73 (09) : 12625 - 12637
  • [9] Real-Time Multimodal 3D Object Detection with Transformers
    Liu, Hengsong
    Duan, Tongle
    [J]. WORLD ELECTRIC VEHICLE JOURNAL, 2024, 15 (07):
  • [10] Jensen-Shannon Divergence You Only Look Once: A Real-Time Robotic Grasp Detection Network
    Han, Tianjiao
    Yu, Dan
    [J]. ADVANCED INTELLIGENT SYSTEMS, 2024, 6 (05)