共 50 条
- [1] VLCAP: VISION-LANGUAGE WITH CONTRASTIVE LEARNING FOR COHERENT VIDEO PARAGRAPH CAPTIONING [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3656 - 3661
- [3] Hierarchical Language Modeling for Dense Video Captioning [J]. INVENTIVE COMPUTATION AND INFORMATION TECHNOLOGIES, ICICIT 2021, 2022, 336 : 421 - 431
- [4] Unified Vision-Language Pre-Training for Image Captioning and VQA [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13041 - 13049
- [5] Scaling Up Vision-Language Pre-training for Image Captioning [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17959 - 17968
- [7] Enhancing Video Summarization via Vision-Language Embedding [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1052 - 1060
- [8] Multi-task Learning of Hierarchical Vision-Language Representation [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10484 - 10493
- [9] Exploring the Synergy Between Vision-Language Pretraining and ChatGPT for Artwork Captioning: A Preliminary Study [J]. IMAGE ANALYSIS AND PROCESSING - ICIAP 2023 WORKSHOPS, PT II, 2024, 14366 : 309 - 321
- [10] Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining? [J]. ACM Transactions on Multimedia Computing, Communications and Applications, 20 (12):