共 38 条
- [1] MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4659 - 4664
- [2] Temporally Multi-Modal Semantic Reasoning with Spatial Language Constraints for Video Question Answering SYMMETRY-BASEL, 2022, 14 (06):
- [3] Differentiated Attention with Multi-modal Reasoning for Video Question Answering 2022 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, BIG DATA AND ALGORITHMS (EEBDA), 2022, : 525 - 530
- [5] Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXVIII, 2022, 12267
- [7] Advancing Video Question Answering with a Multi-modal and Multi-layer Question Enhancement Network PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3985 - 3993
- [8] Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1313 - 1321