共 50 条
- [41] Co-Attention Network With Question Type for Visual Question Answering [J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
- [42] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 547 - 556
- [43] Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering [J]. PROCEEDINGS OF THE 2023 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2023, 2023, : 169 - 176
- [44] TASK-ORIENTED MULTI-MODAL QUESTION ANSWERING FOR COLLABORATIVE APPLICATIONS [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1426 - 1430
- [45] MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 4659 - 4664
- [46] Multi-modal Question Answering System Driven by Domain Knowledge Graph [J]. 5TH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM 2019), 2019, : 43 - 47
- [49] Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering [J]. COMPUTER VISION - ECCV 2016, PT VII, 2016, 9911 : 451 - 466
- [50] A Context-aware Attention Network for Interactive Question Answering [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 927 - 935