共 50 条
- [1] Focal Visual-Text Attention for Visual Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6135 - 6143
- [3] Question Type Guided Attention in Visual Question Answering [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
- [4] Text-Guided Dual-Branch Attention Network for Visual Question Answering [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 750 - 760
- [5] Visual Question Answering using Explicit Visual Attention [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
- [6] Towards Video Text Visual Question Answering: Benchmark and Baseline [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [8] CgT-GAN: CLIP-guided Text GAN for Image Captioning [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2252 - 2263
- [9] SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9666 - 9675
- [10] Multimodal Cross-guided Attention Networks for Visual Question Answering [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTER MODELING, SIMULATION AND ALGORITHM (CMSA 2018), 2018, 151 : 347 - 353