共 50 条
- [1] Are Vision-Language Transformers Learning Multimodal Representations? A Probing Perspective THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11248 - 11257
- [3] MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2280 - 2292
- [5] Learning Multimodal Representations for Unseen Activities 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 506 - 515
- [7] LEARNING TO FUSE LATENT REPRESENTATIONS FOR MULTIMODAL DATA 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3122 - 3126
- [8] Learning Disentangled Multimodal Representations for the Fashion Domain 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 557 - 566