共 50 条
- [21] TVLT: Textless Vision-Language Transformer ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [22] Orthogonal Transformer: An Efficient Vision Transformer Backbone with Token Orthogonalization ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [23] Episodic Transformer for Vision-and-Language Navigation 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 15922 - 15932
- [25] Masked Vision-language Transformer in Fashion Machine Intelligence Research, 2023, 20 : 421 - 434
- [26] Green Hierarchical Vision Transformer for Masked Image Modeling ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [27] Convolutional Embedding Makes Hierarchical Vision Transformer Stronger COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 739 - 756
- [28] Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2082 - 2091
- [29] Integrating language, vision and action for human robot dialog systems UNIVERSAL ACCESS IN HUMAN-COMPUTER INTERACTION: AMBIENT INTERACTION, PT 2, PROCEEDINGS, 2007, 4555 : 987 - +
- [30] HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4768 - 4777