共 9 条
- [1] Episodic Transformer for Vision-and-Language Navigation [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
- [2] MAGVLT: Masked Generative Vision-and-Language Transformer [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23338 - 23348
- [3] History Aware Multimodal Transformer for Vision-and-Language Navigation [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [4] Multimodal Transformer with Variable-Length Memory for Vision-and-Language Navigation [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 380 - 397
- [5] A Cross-Modal Object-Aware Transformer for Vision-and-Language Navigation [J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 976 - 981
- [6] SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
- [8] Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16516 - 16526
- [9] Double-Fine-Tuning Multi-Objective Vision-and-Language Transformer for Social Media Popularity Prediction [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9462 - 9466