共 50 条
- [1] End-to-End Dense Video Captioning with Masked Transformer [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8739 - 8748
- [2] End-to-End Dense Video Captioning with Parallel Decoding [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6827 - 6837
- [3] End-to-End Video Captioning [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 1474 - 1482
- [4] End-to-end Generative Pretraining for Multimodal Video Captioning [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17938 - 17947
- [5] End-to-End Video Captioning with Multitask Reinforcement Learning [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 339 - 348
- [6] SWINBERT: End-to-End Transformers with Sparse Attention for Video Captioning [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17928 - 17937
- [8] End-to-End Transformer Based Model for Image Captioning [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2585 - 2594
- [9] A Novel End-to-End Image Caption Based on Multimodal Attention [J]. Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2020, 49 (06): : 867 - 874