Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

被引:0
|
作者
Feng, Yifan [1 ]
Huang, Jiangang [2 ]
Du, Shaoyi [2 ]
Ying, Shihui [3 ]
Yong, Jun-Hai [1 ]
Li, Yipeng [4 ]
Ding, Guiguang [1 ]
Ji, Rongrong [5 ]
Gao, Yue [1 ]
机构
[1] Tsinghua Univ, Sch Software, BLBCI, THUIBCS,BNRist, Beijing 100084, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Coll Artificial Intelligence, Xian 710049, Peoples R China
[3] Shanghai Univ, Sch Sci, Dept Math, Shanghai 200444, Peoples R China
[4] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[5] Xiamen Univ, Media Analyt & Comp Lab, Dept Artificial Intelligence, Sch Informat Inst Artificial Intelligence,Fujian, Xiamen 361005, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Feature extraction; YOLO; Semantics; Neck; Computational modeling; Visualization; Correlation; Computer vision; Scattering; Electronic mail; Hypergraph; hypergraph computation; hypergraph nerual networks; object detection;
D O I
10.1109/TPAMI.2024.3524377
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propose the Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework, which transposes visual feature maps into a semantic space and constructs a hypergraph for high-order message propagation. This enables the model to acquire both semantic and structural information, advancing beyond conventional feature-focused learning. Hyper-YOLO incorporates the proposed Mixed Aggregation Network (MANet) in its backbone for enhanced feature extraction and introduces the Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net) in its neck. HyperC2Net operates across five scales and breaks free from traditional grid structures, allowing for sophisticated high-order interactions across levels and positions. This synergy of components positions Hyper-YOLO as a state-of-the-art architecture in various scale models, as evidenced by its superior performance on the COCO dataset. Specifically, Hyper-YOLO-N significantly outperforms the advanced YOLOv8-N and YOLOv9-T with 12% AP(val) and 9% AP(val) improvements.
引用
收藏
页码:2388 / 2401
页数:14
相关论文
共 15 条
  • [1] Enhancing Steel Surface Defect Detection: A Hyper-YOLO Approach with Ghost Modules and Hyper FPN
    Wu, Guinan
    Wu, Qinghong
    IAENG International Journal of Computer Science, 2024, 51 (09) : 1321 - 1330
  • [2] AVS-YOLO: Object Detection in Aerial Visual Scene
    Ma, You
    Chai, Lin
    Jin, Lizuo
    Yu, Yafeng
    Yan, Jun
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [3] When Object Detection Meets Knowledge Distillation: A Survey
    Li, Zhihui
    Xu, Pengfei
    Chang, Xiaojun
    Yang, Luyao
    Zhang, Yuanyuan
    Yao, Lina
    Chen, Xiaojiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10555 - 10579
  • [4] When Few-Shot Learning Meets Video Object Detection
    Yu, Zhongjie
    Wang, Gaoang
    Chen, Lin
    Raschka, Sebastian
    Luo, Jiebo
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2986 - 2992
  • [5] When super-resolution meets camouflaged object detection: A comparison study
    Wen, Juan
    Cheng, Shupeng
    Hou, Weiyan
    Van Gool, Luc
    Timofte, Radu
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 253
  • [6] Pedestrian object detection with fusion of visual attention mechanism and semantic computation
    Xiao, Feng
    Liu, Baotong
    Li, Runa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (21-22) : 14593 - 14607
  • [7] Pedestrian object detection with fusion of visual attention mechanism and semantic computation
    Feng Xiao
    Baotong Liu
    Runa Li
    Multimedia Tools and Applications, 2020, 79 : 14593 - 14607
  • [8] Enhancing UAV Visual Landing Recognition with YOLO's Object Detection by Onboard Edge Computing
    Ma, Ming-You
    Shen, Shang-En
    Huang, Yi-Cheng
    SENSORS, 2023, 23 (21)
  • [9] RTOD-YOLO: Traffic Object Detection in UAV Images Based on Visual Attention and Re-parameterization
    Ma, Xuesen
    Wei, Weixin
    Dong, Jindian
    Zheng, Biao
    Ma, Ji
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [10] Decentralized detection for B5G massive MIMO: When local computation meets iterative algorithm
    Yang, Qiyu
    Yan, Jiayi
    Zhang, Xia
    Zhang, Hekun
    PHYSICAL COMMUNICATION, 2022, 51