共 50 条
- [31] VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21374 - 21383
- [32] COM Kitchens: An Unedited Overhead-View Video Dataset as a Vision-Language Benchmark COMPUTER VISION - ECCV 2024, PT LXV, 2025, 15123 : 123 - 140
- [34] Just Ask: An Interactive Learning Framework for Vision and Language Navigation THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 2459 - 2466
- [35] Counterfactual Cycle-Consistent Learning for Instruction Following and Generation in Vision-Language Navigation 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15450 - 15460
- [38] Image as a Foreign Language: BEIT Pretraining for Vision and Vision-Language Tasks 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19175 - 19186
- [40] Masked Vision-language Transformer in Fashion Machine Intelligence Research, 2023, 20 : 421 - 434