共 50 条
- [31] Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4566 - 4577
- [33] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6565 - 6574
- [35] A Framework for Video-Text Retrieval with Noisy Supervision PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 373 - 383
- [36] Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 396 - 404
- [37] Multi-event Video-Text Retrieval 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22056 - 22066
- [38] A NOVEL CONVOLUTIONAL ARCHITECTURE FOR VIDEO-TEXT RETRIEVAL 2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
- [40] Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61