Center-enhanced video captioning model with multimodal semantic alignment

被引:0
|
作者
Zhang, Benhui [1 ,2 ]
Gao, Junyu [2 ,3 ]
Yuan, Yuan [2 ]
机构
[1] School of Computer Science, Northwestern Polytechnical University, Xi'an,710072, China
[2] School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an,710072, China
[3] Shanghai Artificial Intelligence Laboratory, Shanghai,200232, China
关键词
Compendex;
D O I
10.1016/j.neunet.2024.106744
中图分类号
学科分类号
摘要
Video analysis
引用
收藏
相关论文
共 50 条
  • [41] Attentive Visual Semantic Specialized Network for Video Captioning
    Perez-Martin, Jesus
    Bustos, Benjamin
    Perez, Jorge
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5767 - 5774
  • [42] Structured Encoding Based on Semantic Disambiguation for Video Captioning
    Sun, Bo
    Tian, Jinyu
    Wu, Yong
    Yu, Lunjun
    Tang, Yuanyan
    [J]. COGNITIVE COMPUTATION, 2024, 16 (03) : 1032 - 1048
  • [43] Video Captioning with Semantic Information from the Knowledge Base
    Wang, Dan
    Song, Dandan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 224 - 229
  • [44] Video captioning with stacked attention and semantic hard pull
    Rahman, Md Mushfiqur
    Abedin, Thasin
    Prottoy, Khondokar S. S.
    Moshruba, Ayana
    Siddiqui, Fazlul Hasan
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 18
  • [45] Dense video captioning using unsupervised semantic information
    Estevam, Valter
    Laroca, Rayson
    Pedrini, Helio
    Menotti, David
    [J]. Journal of Visual Communication and Image Representation, 2025, 107
  • [46] Richer Semantic Visual and Language Representation for Video Captioning
    Tang, Pengjie
    Wang, Hanli
    Wang, Hanzhang
    Xu, Kaisheng
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1871 - 1876
  • [47] Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language
    Liu, An-An
    Xu, Ning
    Wong, Yongkang
    Li, Junnan
    Su, Yu-Ting
    Kankanhalli, Mohan
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 163 : 113 - 125
  • [48] Multimodal semantic enhanced representation network for micro-video event detection
    Li, Yun
    Liu, Xianyi
    Zhang, Lijuan
    Tian, Haoyu
    Jing, Peiguang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 301
  • [49] M3: Multimodal Memory Modelling for Video Captioning
    Wang, Junbo
    Wang, Wei
    Huang, Yan
    Wang, Liang
    Tan, Tieniu
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7512 - 7520
  • [50] From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning
    Song, Jingkuan
    Guo, Yuyu
    Gao, Lianli
    Li, Xuelong
    Hanjalic, Alan
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (10) : 3047 - 3058