Object Detection Difficulty: Suppressing Over-aggregation for Faster and Better Video Object Detection

被引:0
|
作者
Zhang, Bingqing [1 ]
Wang, Sen [2 ]
Liu, Yifan [3 ]
Kusy, Brano [4 ]
Li, Xue [2 ]
Liu, Jiajun [4 ]
机构
[1] Renmin Univ China, Beijing, Peoples R China
[2] Univ Queensland, Brisbane, Qld, Australia
[3] Univ Adelaide, Adelaide, SA, Australia
[4] CSIRO, Data 61, Brisbane, Qld, Australia
关键词
Video Object Detection; Efficient Video Perception; Object Detection Metrics; Feature Aggregation / Fusion;
D O I
10.1145/3581783.3612090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current video object detection (VOD) models often encounter issues with over-aggregation due to redundant aggregation strategies, which perform feature aggregation on every frame. This results in suboptimal performance and increased computational complexity. In this work, we propose an image-level Object Detection Difficulty (ODD) metric to quantify the difficulty of detecting objects in a given image. The derived ODD scores can be used in the VOD process to mitigate over-aggregation. Specifically, we train an ODD predictor as an auxiliary head of a still-image object detector to compute the ODD score for each image based on the discrepancies between detection results and ground-truth bounding boxes. The ODD score enhances the VOD system in two ways: 1) it enables the VOD system to select superior global reference frames, thereby improving overall accuracy; and 2) it serves as an indicator in the newly designed ODD Scheduler to eliminate the aggregation of frames that are easy to detect, thus accelerating the VOD process. Comprehensive experiments demonstrate that, when utilized for selecting global reference frames, ODD-VOD consistently enhances the accuracy of Global-frame-based VOD models. When employed for acceleration, ODD-VOD consistently improves the frames per second (FPS) by an average of 73.3% across 8 different VOD models without sacrificing accuracy. When combined, ODD-VOD attains state-of-the-art performance when competing with many VOD methods in both accuracy and speed. Our work represents a significant advancement towards making VOD more practical for real-world applications. The code will be released at https://github.com/bingqingzhang/odd-vod.
引用
收藏
页码:1768 / 1778
页数:11
相关论文
共 50 条
  • [1] Exploiting Better Feature Aggregation for Video Object Detection
    Han, Liang
    Wang, Pichao
    Yin, Zhaozheng
    Wang, Fan
    Li, Hao
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1469 - 1477
  • [2] TRACKING ASSISTED FASTER VIDEO OBJECT DETECTION
    Yang, Wenfei
    BinLiu
    Li, Weihai
    Yu, Nenghai
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1750 - 1755
  • [3] Adaptive Feature Aggregation for Video Object Detection
    Qian, Yijun
    Yu, Lijun
    Liu, Wenhe
    Kang, Guoliang
    Hauptmann, Alexander G.
    [J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2020, : 143 - 147
  • [4] Identity-Consistent Aggregation for Video Object Detection
    Deng, Chaorui
    Chen, Da
    Wu, Qi
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13388 - 13398
  • [5] DUALFEAT: DUAL FEATURE AGGREGATION FOR VIDEO OBJECT DETECTION
    Pan, Jing
    Du, Kaiwen
    Yan, Yan
    Wang, Hanzi
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2901 - 2905
  • [6] Sequence Level Semantics Aggregation for Video Object Detection
    Wu, Haiping
    Chen, Yuntao
    Wang, Naiyan
    Zhang, Zhaoxiang
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9216 - 9224
  • [7] Video Object Detection Using Motion Context and Feature Aggregation
    Kim, Jaekyum
    Koh, Junho
    Choi, Jun Won
    [J]. 11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 269 - 272
  • [8] Flow-Guided Feature Aggregation for Video Object Detection
    Zhu, Xizhou
    Wang, Yujie
    Dai, Jifeng
    Yuan, Lu
    Wei, Yichen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 408 - 417
  • [9] Dual-Memory Feature Aggregation for Video Object Detection
    Fan, Diwei
    Zheng, Huicheng
    Dang, Jisheng
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI, 2024, 14430 : 220 - 232
  • [10] Temporal Context Enhanced Feature Aggregation for Video Object Detection
    He, Fei
    Gao, Naiyu
    Li, Qiaozhe
    Du, Senyao
    Zhao, Xin
    Huang, Kaiqi
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10941 - 10948