共 50 条
- [41] VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
- [44] An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 19 - 35
- [45] Pix2Planning: End-to-End Planning by Vision-language Model for Autonomous Driving on Carla Simulator 2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 2383 - 2390
- [48] Google Landmarks Dataset v2 A Large-Scale Benchmark for Instance-Level Recognition and Retrieval 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2572 - 2581
- [49] The Casual Conversations v2 Dataset A diverse, large benchmark for measuring fairness and robustness in audio/vision/speech models 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 10 - 17