共 14 条
- [1] Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [2] Towards Long-Form Video Understanding 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1884 - 1894
- [3] VideoAgent: Long-Form Video Understanding with Large Language Model as Agent COMPUTER VISION - ECCV 2024, PT LXXX, 2025, 15138 : 58 - 76
- [4] EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
- [5] Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 1181 - 1186
- [6] Temporal-spatial information mining and aggregation for video matting Multimedia Tools and Applications, 2024, 83 : 29221 - 29237
- [8] VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 331 - 348
- [9] Selective Structured State-Spaces for Long-Form Video Understanding 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6387 - 6397
- [10] MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14773 - 14783