共 83 条
- [41] Ren S Q, He K M, Girshick R, Et al., Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
- [42] Kiela D, Bottou L., Learning Image Embeddings Using Convolutional Neural Networks for Improved Multi-Modal Semantics, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 36-45, (2014)
- [43] Anderson P, He X D, Buehler C, Et al., Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [C], Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6077-6086, (2018)
- [44] Dong L H, Xu S, Xu B., Speech-Transformer: A No-recurrence Sequence-to-Sequence Model for Speech Recognition, Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5884-5888, (2018)
- [45] Purwins H, Li B, Virtanen T, Et al., Deep Learning for Audio Signal Processing, IEEE Journal of Selected Topics in Signal Processing, 13, 2, pp. 206-219, (2019)
- [46] Hu Fengsong, Zhang Xuan, Speaker Recognition Method Based on Mel Frequency Cepstrum Coefficient and Inverted Mel Frequency Cepstrum Coefficient, Journal of Computer Applications, 32, 9, pp. 2542-2544, (2012)
- [47] Zhang X, Yuan J L, Li L, Et al., Reducing the Bias of Visual Objects in Multimodal Named Entity Recognition, Proceedings of the 16th ACM International Conference on Web Search and Data Mining, pp. 958-966, (2023)
- [48] Liu P P, Li H, Ren Y M, Et al., A Novel Framework for Multimodal Named Entity Recognition with Multi-level Alignments
- [49] Khare Y, Bagal V, Mathew M, Et al., MMBERT: Multimodal BERT Pretraining for Improved Medical VQA[C], Proceedings of 2021 IEEE 18th International Symposium on Biomedical Imaging, pp. 1033-1036, (2021)
- [50] Jiang Y G, Wu Z X, Wang J, Et al., Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 2, pp. 352-364, (2018)