Video Temporal Grounding with Multi-Model Collaborative Learning

被引：0

作者：

Tian, Yun ^{[1
]}

Guo, Xiaobo ^{[1
]}

Wang, Jinsong ^{[1
]}

Li, Bin ^{[2
]}

Zhou, Shoujun ^{[2
]}

机构：

[1] Changchun Univ Sci & Technol, Sch Optoelect Engn, Changchun 130022, Peoples R China

[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2025年 / 15卷 / 06期

关键词：

video temporal grounding; collaborative learning; pseudo-label; iterative training; HIERARCHY;

D O I：

10.3390/app15063072

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Given an untrimmed video and a natural language query, the video temporal grounding task aims to accurately locate the target segment within the video. Functioning as a critical conduit between computer vision and natural language processing, this task holds profound importance in advancing video comprehension. Current research predominantly centers on enhancing the performance of individual models, thereby overlooking the extensive possibilities afforded by multi-model synergy. While knowledge flow methods have been adopted for multi-model and cross-modal collaborative learning, several critical concerns persist, including the unidirectional transfer of knowledge, low-quality pseudo-label generation, and gradient conflicts inherent in cooperative training. To address these issues, this research proposes a Multi-Model Collaborative Learning (MMCL) framework. By incorporating a bidirectional knowledge transfer paradigm, the MMCL framework empowers models to engage in collaborative learning through the interchange of pseudo-labels. Concurrently, the mechanism for generating pseudo-labels is optimized using the CLIP model's prior knowledge, bolstering both the accuracy and coherence of these labels while efficiently discarding extraneous temporal fragments. The framework also integrates an iterative training algorithm for multi-model collaboration, mitigating gradient conflicts through alternate optimization and achieving a dynamic balance between collaborative and independent learning. Empirical evaluations across multiple benchmark datasets indicate that the MMCL framework markedly elevates the performance of video temporal grounding models, exceeding existing state-of-the-art approaches in terms of mIoU and Rank@1. Concurrently, the framework accommodates both homogeneous and heterogeneous model configurations, demonstrating its broad versatility and adaptability. This investigation furnishes an effective avenue for multi-model collaborative learning in video temporal grounding, bolstering efficient knowledge dissemination and charting novel pathways in the domain of video comprehension.

引用

页数：27

共 50 条

[31] A Multi-model Approach for Video Data Retrieval in Autonomous Vehicle Development
Knapp, Jesper
Moberg, Klas
Jin, Yuchuan
Sun, Simin
Staron, Miroslaw
PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT. INDUSTRY-, WORKSHOP-, AND DOCTORAL SYMPOSIUM PAPERS, PROFES 2024, 2025, 15453 : 35 - 49
[32] Robust Multi-model Personalized Federated Learning via Model Distillation
Muhammad, Adil
Lin, Kai
Gao, Jian
Chen, Bincai
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT III, 2022, 13157 : 432 - 446
[33] Multi-model nature of time and temporal relations in the SL semantic language
S. V. Elkin
V. V. Kulikov
E. S. Klyshinskii
O. Yu. Mansurova
V. Yu. Maksimov
T. N. Musaeva
S. N. Amineva
Automatic Documentation and Mathematical Linguistics, 2008, 42 (1) : 53 - 65
[34] Multi-Model Nature of Time and Temporal Relations in the SL Semantic Language
Elkin, S. V.
Kulikov, V. V.
Klyshinskii, E. S.
Mansurova, O. Yu.
Maksimov, V. Yu.
Musaeva, T. N.
Amineva, S. N.
AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2008, 42 (01) : 53 - 65
[35] Research on Multi-Model Fusion for Multi-Indicator Collaborative Anomaly Prediction in IoT Devices
Wang, Donghao
Wang, Tengjiang
Qi, Hao
Liu, Shijun
Pan, Li
PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2816 - 2821
[36] Multi-branch Collaborative Learning Network for 3D Visual Grounding
Qian, Zhipeng
Ma, Yiwei
Lin, Zhekai
Ji, Jiayi
Zheng, Xiawu
Sun, Xiaoshuai
Ji, Rongrong
COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 381 - 398
[37] Multi-model partitioning the multi-model evolutionary framework for intelligent control
Lainiotis, DG
PROCEEDINGS OF THE 2000 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL, 2000, : P15 - P20
[38] Intelligent Resource Scheduling for Co-located Latency-critical Services: A Multi-Model Collaborative Learning Approach
Liu, Lei
Dou, Xinglei
Chen, Yuetao
PROCEEDINGS OF THE 21ST USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, FAST 2023, 2023, : 153 - 166
[39] Enhancing video temporal grounding with large language model-based data augmentation
Tian, Yun
Guo, Xiaobo
Wang, Jinsong
Li, Bin
JOURNAL OF SUPERCOMPUTING, 2025, 81 (05):
[40] Wind turbine blade icing detection with multi-model collaborative monitoring method
Guo, Peng
Infield, David
Renewable Energy, 2021, 179 : 1098 - 1105

← 1 2 3 4 5 →