共 33 条
- [11] Blei D M, Ng A Y, Jordan M I., Latent dirichlet allocation, Journal of Machine Learning Research, 3, 1, pp. 993-1022, (2003)
- [12] Lu J, Xiong C, Parikh D, Et al., Knowing when to look: Adaptive attention via a visual sentinel for image captioning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3242-3250, (2017)
- [13] You Q, Jin H, Wang Z, Et al., Image captioning with semantic attention, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4651-4659, (2016)
- [14] Dai J, Li Y, He K, Et al., R-FCN: Object detection via region-based fully convolutional networks, Advances in Neural Information Processing Systems (NIPS), pp. 379-387, (2016)
- [15] Gu J, Cai J, Wang G, Et al., Stack-captioning: Coarse-to-fine learning for image captioning, Proceedings of 32nd AAAI Conference on Artificial Intelligence (AAAI), pp. 6837-6844, (2018)
- [16] Jiang W, Ma L, Jiang Y G, Et al., Recurrent fusion network for image captioning, Proceedings of the European Conference on Computer Vision (ECCV), pp. 499-515, (2018)
- [17] Goodfellow I, Pouget-Abadie J, Mirza M, Et al., Generative adversarial nets, Advances in Neural Information Processing Systems (NIPS), pp. 2672-2680, (2014)
- [18] Ranzato M A, Chopra S, Auli M, Et al., Sequence level training with recurrent neural networks, (2015)
- [19] Rennie S J, Marcheret E, Mroueh Y, Et al., Self-critical sequence training for image captioning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1179-1195, (2017)
- [20] Dai B, Fidler S, Urtasun R, Et al., Towards diverse and natural image descriptions via a conditional GAN, Proceedings of the IEEE International Conference on Computer Vision (CVPR), pp. 2970-2979, (2017)