共 50 条
- [1] Masked co-attention model for audio-visual event localization [J]. APPLIED INTELLIGENCE, 2024, 54 (02) : 1691 - 1705
- [3] Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 4012 - 4021
- [4] Dual Attention Matching for Audio-Visual Event Localization [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6301 - 6309
- [5] Temporal Cross-Modal Attention for Audio-Visual Event Localization [J]. Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
- [6] Audio-Visual Event Localization in Unconstrained Videos [J]. COMPUTER VISION - ECCV 2018, PT II, 2018, 11206 : 252 - 268
- [7] Cross-Modal Attention Network for Temporal Inconsistent Audio-Visual Event Localization [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 279 - 286
- [8] Learning Event-Specific Localization Preferences for Audio-Visual Event Localization [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3446 - 3454
- [9] Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3884 - 3892
- [10] Dual Perspective Network for Audio-Visual Event Localization [J]. COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 689 - 704