共 50 条
- [11] Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5674 - 5685
- [12] Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
- [13] A fine-grained vision and language representation framework with graph-based fashion semantic knowledge [J]. COMPUTERS & GRAPHICS-UK, 2023, 115 : 216 - 225
- [15] Fine-grained Image Classification via Combining Vision and Language [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7332 - 7340
- [16] Measuring Progress in Fine-grained Vision-and-Language Understanding [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1559 - 1582
- [17] Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1829 - 1838
- [19] Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4725 - 4736
- [20] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031