共 50 条
- [22] Hierarchical attention-based multimodal fusion for video captioning [J]. NEUROCOMPUTING, 2018, 315 : 362 - 370
- [23] Multimodal-enhanced hierarchical attention network for video captioning [J]. Multimedia Systems, 2023, 29 : 2469 - 2482
- [24] Multimodal-enhanced hierarchical attention network for video captioning [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2469 - 2482
- [25] End-to-end Generative Pretraining for Multimodal Video Captioning [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17938 - 17947
- [29] Video Captioning via Hierarchical Reinforcement Learning [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4213 - 4222
- [30] Learning deep spatiotemporal features for video captioning [J]. PATTERN RECOGNITION LETTERS, 2018, 116 : 143 - 149