共 50 条
- [1] Learning Multimodal Attention LSTM Networks for Video Captioning [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 537 - 545
- [2] MULTIMODAL SEMANTIC ATTENTION NETWORK FOR VIDEO CAPTIONING [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1300 - 1305
- [3] Multimodal attention-based transformer for video captioning [J]. Applied Intelligence, 2023, 53 : 23349 - 23368
- [4] Multimodal attention-based transformer for video captioning [J]. APPLIED INTELLIGENCE, 2023, 53 (20) : 23349 - 23368
- [5] Hierarchical attention-based multimodal fusion for video captioning [J]. NEUROCOMPUTING, 2018, 315 : 362 - 370
- [6] Multimodal-enhanced hierarchical attention network for video captioning [J]. Multimedia Systems, 2023, 29 : 2469 - 2482
- [8] M3: Multimodal Memory Modelling for Video Captioning [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7512 - 7520
- [10] MAPS: Joint Multimodal Attention and POS Sequence Generation for Video Captioning [J]. 2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,