Video Temporal Grounding with Multi-Model Collaborative Learning

被引:0
|
作者
Tian, Yun [1 ]
Guo, Xiaobo [1 ]
Wang, Jinsong [1 ]
Li, Bin [2 ]
Zhou, Shoujun [2 ]
机构
[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期
关键词
video temporal grounding; collaborative learning; pseudo-label; iterative training; HIERARCHY;
D O I
10.3390/app15063072
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Given an untrimmed video and a natural language query, the video temporal grounding task aims to accurately locate the target segment within the video. Functioning as a critical conduit between computer vision and natural language processing, this task holds profound importance in advancing video comprehension. Current research predominantly centers on enhancing the performance of individual models, thereby overlooking the extensive possibilities afforded by multi-model synergy. While knowledge flow methods have been adopted for multi-model and cross-modal collaborative learning, several critical concerns persist, including the unidirectional transfer of knowledge, low-quality pseudo-label generation, and gradient conflicts inherent in cooperative training. To address these issues, this research proposes a Multi-Model Collaborative Learning (MMCL) framework. By incorporating a bidirectional knowledge transfer paradigm, the MMCL framework empowers models to engage in collaborative learning through the interchange of pseudo-labels. Concurrently, the mechanism for generating pseudo-labels is optimized using the CLIP model's prior knowledge, bolstering both the accuracy and coherence of these labels while efficiently discarding extraneous temporal fragments. The framework also integrates an iterative training algorithm for multi-model collaboration, mitigating gradient conflicts through alternate optimization and achieving a dynamic balance between collaborative and independent learning. Empirical evaluations across multiple benchmark datasets indicate that the MMCL framework markedly elevates the performance of video temporal grounding models, exceeding existing state-of-the-art approaches in terms of mIoU and Rank@1. Concurrently, the framework accommodates both homogeneous and heterogeneous model configurations, demonstrating its broad versatility and adaptability. This investigation furnishes an effective avenue for multi-model collaborative learning in video temporal grounding, bolstering efficient knowledge dissemination and charting novel pathways in the domain of video comprehension.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development
    Knapp, Jesper
    Moberg, Klas
    Jin, Yuchuan
    Sun, Simin
    Staron, Miroslaw
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT. INDUSTRY-, WORKSHOP-, AND DOCTORAL SYMPOSIUM PAPERS, PROFES 2024, 2025, 15453 : 35 - 49
  • [32] Robust Multi-model Personalized Federated Learning via Model Distillation
    Muhammad, Adil
    Lin, Kai
    Gao, Jian
    Chen, Bincai
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT III, 2022, 13157 : 432 - 446
  • [33] Multi-model nature of time and temporal relations in the SL semantic language
    S. V. Elkin
    V. V. Kulikov
    E. S. Klyshinskii
    O. Yu. Mansurova
    V. Yu. Maksimov
    T. N. Musaeva
    S. N. Amineva
    Automatic Documentation and Mathematical Linguistics, 2008, 42 (1) : 53 - 65
  • [34] Multi-Model Nature of Time and Temporal Relations in the SL Semantic Language
    Elkin, S. V.
    Kulikov, V. V.
    Klyshinskii, E. S.
    Mansurova, O. Yu.
    Maksimov, V. Yu.
    Musaeva, T. N.
    Amineva, S. N.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2008, 42 (01) : 53 - 65
  • [35] Research on Multi-Model Fusion for Multi-Indicator Collaborative Anomaly Prediction in IoT Devices
    Wang, Donghao
    Wang, Tengjiang
    Qi, Hao
    Liu, Shijun
    Pan, Li
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2816 - 2821
  • [36] Multi-branch Collaborative Learning Network for 3D Visual Grounding
    Qian, Zhipeng
    Ma, Yiwei
    Lin, Zhekai
    Ji, Jiayi
    Zheng, Xiawu
    Sun, Xiaoshuai
    Ji, Rongrong
    COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 381 - 398
  • [37] Multi-model partitioning the multi-model evolutionary framework for intelligent control
    Lainiotis, DG
    PROCEEDINGS OF THE 2000 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2000, : P15 - P20
  • [38] Intelligent Resource Scheduling for Co-located Latency-critical Services: A Multi-Model Collaborative Learning Approach
    Liu, Lei
    Dou, Xinglei
    Chen, Yuetao
    PROCEEDINGS OF THE 21ST USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, FAST 2023, 2023, : 153 - 166
  • [39] Enhancing video temporal grounding with large language model-based data augmentation
    Tian, Yun
    Guo, Xiaobo
    Wang, Jinsong
    Li, Bin
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (05):
  • [40] Wind turbine blade icing detection with multi-model collaborative monitoring method
    Guo, Peng
    Infield, David
    Renewable Energy, 2021, 179 : 1098 - 1105