共 50 条
- [21] Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1456 - 1463
- [22] Self-Supervised Audio-Visual Feature Learning for Single-Modal Incremental Terrain Type Clustering [J]. IEEE ACCESS, 2021, 9 : 64346 - 64357
- [23] Single-modal Incremental Terrain Clustering from Self-Supervised Audio-Visual Feature Learning [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9399 - 9406
- [24] Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3884 - 3892
- [28] AV-PedAware: Self-Supervised Audio-Visual Fusion for Dynamic Pedestrian Awareness [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 1871 - 1877
- [30] Self-Supervised Video Representation and Temporally Adaptive Attention for Audio-Visual Event Localization [J]. APPLIED SCIENCES-BASEL, 2022, 12 (24):