共 50 条
- [1] AVSegFormer: Audio-Visual Segmentation with Transformer [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 11, 2024, : 12155 - 12163
- [2] The Right to Talk: An Audio-Visual Transformer Approach [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1085 - 1094
- [3] Effect of Audio-Visual Factors in the Evaluation of Crowd Noise [J]. APPLIED SCIENCES-BASEL, 2023, 13 (06):
- [4] AVMSN: An Audio-Visual Two Stream Crowd Counting Framework Under Low-Quality Conditions [J]. IEEE ACCESS, 2021, 9 : 80500 - 80510
- [5] AVMSN: An Audio-Visual Two Stream Crowd Counting Framework under Low-Quality Conditions [J]. IEEE Access, 2021, 9 : 80500 - 80510
- [6] Audio-visual event detection based on mining of semantic audio-visual labels [J]. STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
- [7] The Problems and Challenges of Managing Crowd Sourced Audio-Visual Evidence [J]. FUTURE INTERNET, 2014, 6 (02): : 190 - 202
- [8] Audio-Visual Action Recognition Using Transformer Fusion Network [J]. APPLIED SCIENCES-BASEL, 2024, 14 (03):
- [9] A PRE-TRAINED AUDIO-VISUAL TRANSFORMER FOR EMOTION RECOGNITION [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4698 - 4702