共 50 条
- [31] UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 521 - 539
- [33] Vital information matching in vision-and-language navigation FRONTIERS IN NEUROROBOTICS, 2022, 16
- [34] MAGVLT: Masked Generative Vision-and-Language Transformer 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23338 - 23348
- [35] Masked Path Modeling for Vision-and-Language Navigation FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15255 - 15269
- [36] Federated Learning for Vision-and-Language Grounding Problems THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11572 - 11579
- [37] VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [38] Local Slot Attention for Vision-and-Language Navigation PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 545 - 553
- [40] Behavioral Analysis of Vision-and-Language Navigation Agents 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2574 - 2582