共 50 条
- [31] Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 551 - 568
- [32] InsightSee: Advancing Multi-agent Vision-Language Models for Enhanced Visual Understanding 2024 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, ICMA 2024, 2024, : 1471 - 1476
- [33] Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21902 - 21912
- [34] Multi-Resolution Sensing for Real-Time Control with Vision-Language Models CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
- [37] VinVL: Revisiting Visual Representations in Vision-Language Models 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5575 - 5584
- [38] Evaluating Attribute Comprehension in Large Vision-Language Models PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 98 - 113
- [39] Towards an Exhaustive Evaluation of Vision-Language Foundation Models 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 339 - 352
- [40] Attention Prompting on Image for Large Vision-Language Models COMPUTER VISION - ECCV 2024, PT XXX, 2025, 15088 : 251 - 268