共 50 条
- [2] Exploiting multiple modalities for interactive video retrieval [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 1032 - 1035
- [4] Visual versus Textual Embedding for Video Retrieval [J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS (ACIVS 2017), 2017, 10617 : 386 - 395
- [5] Fusion of Audio and Video Modalities for Detection of Acoustic Events [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 123 - 126
- [6] Audio visual cues for video indexing and retrieval [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 1, PROCEEDINGS, 2004, 3331 : 642 - 649
- [7] Towards Fusion of Textual and Visual Modalities for Describing Audiovisual Documents [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2015, 6 (02): : 52 - 70
- [9] Exploiting Visual Semantic Reasoning for Video-Text Retrieval [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 1005 - 1011
- [10] Audio-Visual Embedding for Cross-Modal Music Video Retrieval through Supervised Deep CCA [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM 2018), 2018, : 143 - 150