Multiple-Level Distillation for Video Fine-Grained Accident Detection

被引:0
|
作者
Yu, Hongyang [1 ]
Zhang, Xinfeng [2 ]
Wang, Yaowei [1 ]
Huang, Qingming [2 ]
Yin, Baocai [1 ,3 ]
机构
[1] Peng Cheng Lab, Shenzhen 518066, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 100039, Peoples R China
[3] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
基金
中国博士后科学基金;
关键词
Video accident detection; fine-grained accident detection; knowledge distillation; multiple-level distillation; EVENT DETECTION;
D O I
10.1109/TCSVT.2023.3338743
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Accident detection in surveillance or dashcam videos is a common task in the field of traffic accident analysis by using videos. However, as accidents occur sparsely and randomly in the real world, the data records are more scarce than the training data for standard detection tasks such as object detection or instance detection. Moreover, the limited and diverse accident data makes it more difficult to model the accident pattern for fine-grained accident detection tasks analyzing the accident in detail. Extra prior information should be introduced in the tasks such as the common vision feature which could offer relatively effective information for many vision tasks. The big model could generate the common vision feature by training on abundant data and consuming a lot of computing time and resources. Even though the accident video data is special, the big model could also extract common vision features. Thus, in this paper, we propose to apply knowledge distillation to fine-grained accident detection which analyzes the spatial temporal existence and severity for solving the issues of complex computing (distillation to the small model) and keeping good performance under limited accident data. Knowledge distillation could offer extra general vision feature information from the pre-trained big model. Common knowledge distillation guides the student network to learn the same representations from the teacher network by logit mimicking or feature imitation. However, single-level distillation could only focus on one aspect of mimicking classification logit or deep features. Multiple tasks with different focuses are required for fine-grained accident detection, such as multiple accident classification, temporal-spatial accident region detection, and accident severity estimation. Thus in this paper, multiple-level distillation is proposed for the different modules to generate the unified video feature concerning all the tasks in fine-grained accident detection analysis. The various experimental results on a fine-grained accident detection dataset which provides more detailed annotations of accidents demonstrate that our method could effectively model the video feature for multiple tasks.
引用
收藏
页码:4445 / 4457
页数:13
相关论文
共 50 条
  • [21] Fine-Grained Event Trigger Detection
    Duong Minh Le
    Thien Huu Nguyen
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2745 - 2752
  • [22] Predicting next changes at the fine-grained level
    Murakami, Hiroaki
    Hotta, Keisuke
    Higo, Yoshiki
    Kusumoto, Shinji
    Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 2014, 1 : 119 - 126
  • [23] Learning to Geolocalise Tweets at a Fine-Grained Level
    Paule, Jorge David Gonzalez
    Moshfeghi, Yashar
    Macdonald, Craig
    Ounis, Iadh
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1675 - 1678
  • [24] Partitioning of multiple fine-grained scalable video sequences concurrently streamed to heterogeneous clients
    Hsu, Cheng-Hsin
    Hefeeda, Mohamed
    IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (03) : 457 - 469
  • [25] Fine-grained scalable video caching for heterogeneous clients
    Liu, Jiangchuan
    Xu, Jianliang
    Chu, Xiaowen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2006, 8 (05) : 1011 - 1020
  • [26] Temporal Query Networks for Fine-grained Video Understanding
    Zhang, Chuhan
    Gupta, Ankush
    Zisserman, Andrew
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4484 - 4494
  • [27] Fine-grained talking face generation with video reinterpretation
    Huang, Xin
    Wang, Mingjie
    Gong, Minglun
    VISUAL COMPUTER, 2021, 37 (01): : 95 - 105
  • [28] Spotting Temporally Precise, Fine-Grained Events in Video
    Hong, James
    Zhang, Haotian
    Gharbi, Michael
    Fisher, Matthew
    Fatahalian, Kayvon
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 33 - 51
  • [29] Fine-Grained Video Categorization with Redundancy Reduction Attention
    Zhu, Chen
    Tan, Xiao
    Zhou, Feng
    Liu, Xiao
    Yue, Kaiyu
    Ding, Errui
    Ma, Yi
    COMPUTER VISION - ECCV 2018, PT V, 2018, 11209 : 139 - 155
  • [30] FiGO: Fine-Grained Query Optimization in Video Analytics
    Cao, Jiashen
    Sarkar, Karan
    Hadidi, Ramyad
    Arulraj, Joy
    Kim, Hyesoon
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 559 - 572