共 50 条
- [31] MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4659 - 4664
- [34] Text-Guided Multi-Modal Fusion for Underwater Visual Tracking 2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
- [35] Video Visual Relation Detection via Multi-modal Feature Fusion PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2657 - 2661
- [37] Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXVIII, 2022, 12267
- [38] Learning Visual Emotion Distributions via Multi-Modal Features Fusion PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 369 - 377