共 50 条
- [1] SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
- [2] Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 13238 - 13246
- [3] History Aware Multimodal Transformer for Vision-and-Language Navigation [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [4] Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4233 - 4241
- [6] Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9847 - 9857
- [7] Episodic Transformer for Vision-and-Language Navigation [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
- [8] Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 15974 - 15990
- [9] Cross-modal Map Learning for Vision and Language Navigation [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15439 - 15449
- [10] Transformer-Exclusive Cross-Modal Representation for Vision and Language [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2719 - 2725