共 38 条
- [22] TRANSTL: SPATIAL-TEMPORAL LOCALIZATION TRANSFORMER FOR MULTI-LABEL VIDEO CLASSIFICATION 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1965 - 1969
- [23] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 932 - 947
- [24] Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4383 - 4389
- [26] Graph based Spatial-temporal Fusion for Multi-modal Person Re-identification PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3736 - 3744
- [27] MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5751 - 5761
- [28] VD-GR: Boosting Visual Dialog with Cascaded Spatial-Temporal Multi-Modal GRaphs 2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 5793 - 5802