共 50 条
- [1] Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2485 - 2494
- [2] UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4153 - 4163
- [3] Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4233 - 4241
- [4] Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14660 - 14679
- [5] VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
- [7] CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4515 - 4524
- [8] COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2188 - 2197
- [9] CTAL: Pre-training Cross-modal Transformer for Audio-and-Language Representations [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3966 - 3977
- [10] PiTL: Cross-modal Retrieval with Weakly-supervised Vision-language Pre-training via Prompting [J]. PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2261 - 2265