共 50 条
- [2] Multi-modal Dense Video Captioning 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4117 - 4126
- [3] Multi-modal fusion for video understanding 30TH APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, PROCEEDINGS: ANALYSIS AND UNDERSTANDING OF TIME VARYING IMAGERY, 2001, : 103 - 108
- [5] VIDEO MEMORABILITY PREDICTION VIA LATE FUSION OF DEEP MULTI-MODAL FEATURES 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2488 - 2492
- [7] Layer-wise enhanced transformer with multi-modal fusion for image caption Multimedia Systems, 2023, 29 : 1043 - 1056
- [8] Class Consistent Multi-Modal Fusion with Binary Features 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2282 - 2291
- [10] Everything at Once - Multi-modal Fusion Transformer for Video Retrieval 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19988 - 19997