共 50 条
- [41] KAT: A Knowledge Augmented Transformer for Vision-and-Language NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 956 - 968
- [42] MAGVLT: Masked Generative Vision-and-Language Transformer 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23338 - 23348
- [43] FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14085 - 14095
- [45] UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18082 - 18091
- [46] Data Efficient Masked Language Modeling for Vision and Language FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3013 - 3028
- [49] VTST: Efficient Visual Tracking With a Stereoscopic Transformer IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2401 - 2416