共 50 条
- [32] Multimodal-enhanced hierarchical attention network for video captioning Multimedia Systems, 2023, 29 : 2469 - 2482
- [34] Hierarchical Conditional Relation Networks for Multimodal Video Question Answering International Journal of Computer Vision, 2021, 129 : 3027 - 3050
- [35] Convolutional Hierarchical Attention Network for Query-Focused Video Summarization THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12426 - 12433
- [36] Video Referring Expression Comprehension via Transformer with Content-conditioned Query PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON DEEP MULTIMODAL LEARNING FOR INFORMATION RETRIEVAL, MMIR 2023, 2023, : 39 - 48
- [37] I-Brow: Hierarchical and Multimodal Transformer Model for Eyebrows Animation Synthesis ARTIFICIAL INTELLIGENCE IN HCI, AI-HCI 2023, PT II, 2023, 14051 : 435 - 452
- [38] HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11895 - 11905