共 50 条
- [32] Audio-Visual Action Recognition Using Transformer Fusion Network [J]. APPLIED SCIENCES-BASEL, 2024, 14 (03):
- [33] Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition [J]. ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, : 111 - 115
- [34] Valence-Arousal Model based Emotion Recognition using EEG, peripheral physiological signals and Facial Expression [J]. ICMLSC 2020: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND SOFT COMPUTING, 2020, : 81 - 85
- [36] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
- [37] Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 4012 - 4021
- [38] Video clip recognition using joint audio-visual processing model [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL I, PROCEEDINGS, 2002, : 500 - 503
- [40] Video clip recognition using joint audio-visual processing model [J]. Proceedings - International Conference on Pattern Recognition, 2002, 16 (01): : 500 - 503