Question Answering Algorithm on Image Fragmentation Information Based on Deep Neural Network

被引:0
|
作者
Wang Y. [1 ]
Zhuo Y. [1 ]
Wu Y. [1 ]
Chen M. [1 ]
机构
[1] College of Mathematics and Computer Science, Fuzhou University, Fuzhou
关键词
Artificial intelligence; Deep learning; Fragmented information; Neural network; Visual question answering (VQA);
D O I
10.7544/issn1000-1239.2018.20180606
中图分类号
学科分类号
摘要
Many fragmentation information is highly dispersed in different data sources, such as text, image, video and Web. They are characterized by structural disorder and content one-sided. Current researches implement the extraction, expression and understanding of multi-modal fragmentation information by constructing visual question answering (VQA) system. The VQA task is required to provide the correct answer to a given problem with a corresponding image. The aim of this paper is to design a complete framework and algorithm for image fragmentation information question answering under the basic background of visual question answering task. The main research includes image feature extraction, question text feature extraction, multi-modal feature fusion and answer reasoning. Deep neural network is constructed to extract features for representing images and problem information. Attention mechanism and variational inference method are combined to fusion two modal features of image and problem and reason answers. Experiment results show that the model can effectively extract and understand multi-modal fragmentation information, and improve the accuracy of VQA. © 2018, Science Press. All right reserved.
引用
收藏
页码:2600 / 2610
页数:10
相关论文
共 23 条
  • [1] Ren M., Kiros R., Zemel R., Exploring models and data for image question answering, Proc of the 29th Conf on Advances in Neural Information Processing Systems, pp. 2953-2961, (2015)
  • [2] Agrawal A., Lu J., Antol S., Et al., VQA: Visual question answering, International Journal of Computer Vision, 123, 1, pp. 1-28, (2017)
  • [3] Jiang S., Min W., Wang S., Surver and prospect of intelligent interaction-oriented image recognition techniques, Journal of Computer Research and Development, 53, 1, pp. 113-122, (2016)
  • [4] Lecun Y., Boser B.E., Denker J.S., Et al., Backpropagation applied to handwritten zip code recognition, Neural Computation, 1, 4, pp. 541-551, (2014)
  • [5] Elman J.L., Finding structure in time, Cognitive Science, 14, 2, pp. 179-211, (1990)
  • [6] Simonyan K., Zisserman A., Very deep convolutional networks for large-scale image recognition, (2014)
  • [7] He K., Zhang X., Ren S., Et al., Deep residual learning for image recognition, Proc of the 29th IEEE Conf on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
  • [8] Girshick R., Donahue J., Darrell T., Et al., Rich feature hierarchies for accurate object detection and semantic segmentation, Proc of the 27th IEEE Conf on Computer Vision and Pattern Recognition, pp. 580-587, (2014)
  • [9] Cho K., Van Merrienboer B., Gulcehre C., Et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, Proc of the 19th Conf on Empirical Methods in Natural Language, pp. 1724-1734, (2014)
  • [10] Donahue J., Anne H.L., Guadarrama S., Et al., Long-term recurrent convolutional networks for visual recognition and description, Proc of the 28th IEEE Conf on Computer Vision and Pattern Recognition, pp. 2625-2634, (2015)