共 31 条
- [1] Jun Yu, Liang Wang, Zhou Yu, Research on visual question answering techniques[J], Journal of Computer Research and Development, 55, 9, (2018)
- [2] Lu Zhang, Feng Cao, Xinyan Liang, Et al., Cross-modal retrieval with correlation feature propagation[J], Journal of Computer Research and Development, 59, 9, (2022)
- [3] Zhixin Li, Haiyang Wei, Canlong Zhang, Et al., Research progress on image captioning[J], Journal of Computer Research and Development, 58, 9, pp. 1951-1974, (2021)
- [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Et al., Deep residual learning for image recognition[C], Proc of the 34th IEEE Conf on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
- [5] Hara K, Kataoka H, Satoh Y., Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imageNet[C], Proc of the 36th IEEE Conf on Computer Vision and Pattern Recognition, pp. 6546-6555, (2018)
- [6] Peter A, Xiaodong He, Buehler C, Et al., Bottom-up and top-down attention for image captioning and visual question answering[C], Proc of the 36th IEEE Conf on Computer Vision and Pattern Recognition, pp. 6077-6086, (2018)
- [7] Jiasen Lu, Yang Jianwei, Batra D, Et al., Hierarchical question-image co-attention for visual question answering[C], Proc of the 30th Int Conf on Neural Information Proc Systems, pp. 289-297, (2016)
- [8] Jiyang Gao, Ge Runzhou, Chen Kan, Et al., Motion appearance co-memory networks for video question answering[C], Proc of the 36th IEEE Conf on Computer Vision and Pattern Recognition, pp. 6576-6585, (2018)
- [9] Dang L H, Le T, Le V, Et al., Hierarchical object-oriented spatiotemporal reasoning for video question answering[C], Proc of the 30th Int Joint Conf on Artificial Intelligence, pp. 636-642, (2021)
- [10] Jiang Jianwen, Chen Ziqiang, Lin Haojie, Et al., Divide and conquer: Question-guided spatio-temporal conrmual attention for video question answering[C], Proc of the 34th AAAI Conf on Artificial Intelligence, pp. 11101-11108, (2020)