共 50 条
- [2] Semantic-Aligned Cross-Modal Visual Grounding Network with Transformers [J]. APPLIED SCIENCES-BASEL, 2023, 13 (09):
- [3] CHAN: Cross-Modal Hybrid Attention Network for Temporal Language Grounding in Videos [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1499 - 1504
- [5] Learning Cross-Modal Context Graph for Visual Grounding [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11645 - 11652
- [8] Utilizing visual attention for cross-modal coreference interpretation [J]. MODELING AND USING CONTEXT, PROCEEDINGS, 2005, 3554 : 83 - 96
- [9] Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering [J]. IEEE ACCESS, 2018, 6 : 31516 - 31524
- [10] Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 279 - 286