共 50 条
- [31] Subsampling of Frequent Words in Text for Pre-training a Vision-Language Model [J]. PROCEEDINGS OF THE 1ST WORKSHOP ON LARGE GENERATIVE MODELS MEET MULTIMODAL APPLICATIONS, LGM3A 2023, 2023, : 61 - 67
- [33] COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15671 - 15680
- [34] Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark [J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 209, 2023, 209 : 117 - +
- [35] MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23262 - 23271
- [36] Automated Bridge Inspection Image Interpretation Based on Vision-Language Pre-Training [J]. COMPUTING IN CIVIL ENGINEERING 2023-DATA, SENSING, AND ANALYTICS, 2024, : 1 - 8
- [37] Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner [J]. arXiv, 2023,
- [38] VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [39] Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5120 - 5131
- [40] MAKE: Vision-Language Pre-training based Product Retrieval in Taobao Search [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 356 - 360