共 50 条
- [21] Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 7070 - 7074
- [22] FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2669 - 2680
- [23] Semantic Scene Difference Detection in Daily Life Patroling by Mobile Robots using Pre-Trained Large-Scale Vision-Language Model 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 3228 - 3233
- [24] Daily Assistive View Control Learning of Low-Cost Low-Rigidity Robot via Large-Scale Vision-Language Model 2023 IEEE-RAS 22ND INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, HUMANOIDS, 2023,
- [26] Large-Scale Adversarial Training for Vision-and-Language Representation Learning ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
- [27] TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [28] NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models COMPUTER VISION-ECCV 2024, PT VII, 2025, 15065 : 260 - 278
- [30] Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63