共 50 条
- [2] Visual Question Answering with Textual Representations for Images [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3147 - 3150
- [3] Dynamic Memory Networks for Visual and Textual Question Answering [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
- [4] Combining Multiple Cues for Visual Madlibs Question Answering [J]. International Journal of Computer Vision, 2019, 127 : 38 - 60
- [5] Visual-Textual Semantic Alignment Network for Visual Question Answering [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 259 - 270
- [9] Movienet: a movie multilayer network model using visual and textual semantic cues [J]. Applied Network Science, 4
- [10] Question Modifiers in Visual Question Answering [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479