共 50 条
- [1] Multi-level, multi-modal interactions for visual question answering over text in images [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2022, 25 (04): : 1607 - 1623
- [2] Multi-modal Contextual Graph Neural Network for Text Visual Question Answering [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3491 - 3498
- [4] Multi-modal adaptive gated mechanism for visual question answering [J]. PLOS ONE, 2023, 18 (06):
- [6] Multi-level Attention Networks for Visual Question Answering [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4187 - 4195
- [9] Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing [J]. IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXVIII, 2022, 12267