Align vision-language semantics by multi-task learning for multi-modal summarization

被引:0
|
作者
Cui, Chenhao [1 ]
Liang, Xinnian [2 ]
Wu, Shuangzhi [3 ]
Li, Zhoujun [2 ]
机构
[1] School of Cyber Science and Technology, Beihang University, Beijing,100191, China
[2] School of Computer Science and Engineering, Beihang University, Beijing,100191, China
[3] Cloud Xiaowei, Tencent, Beijing,100089, China
关键词
Compendex;
D O I
10.1007/s00521-024-09908-3
中图分类号
学科分类号
摘要
Embeddings
引用
收藏
页码:15653 / 15666
页数:13
相关论文
共 50 条
  • [1] Task-Oriented Multi-Modal Mutual Learning for Vision-Language Models
    Long, Sifan
    Zhao, Zhen
    Yuan, Junkun
    Tan, Zichang
    Liu, Jiangjiang
    Zhou, Luping
    Wang, Shengsheng
    Wang, Jingdong
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21902 - 21912
  • [2] Multi-task Learning of Hierarchical Vision-Language Representation
    Duy-Kien Nguyen
    Okatani, Takayuki
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10484 - 10493
  • [3] MCPL: Multi-Modal Collaborative Prompt Learning for Medical Vision-Language Model
    Wang, Pengyu
    Zhang, Huaqi
    Yuan, Yixuan
    [J]. IEEE Transactions on Medical Imaging, 2024, 43 (12) : 4224 - 4235
  • [4] Multi-modal microblog classification via multi-task learning
    Sicheng Zhao
    Hongxun Yao
    Sendong Zhao
    Xuesong Jiang
    Xiaolei Jiang
    [J]. Multimedia Tools and Applications, 2016, 75 : 8921 - 8938
  • [5] MultiNet: Multi-Modal Multi-Task Learning for Autonomous Driving
    Chowdhuri, Sauhaarda
    Pankaj, Tushar
    Zipser, Karl
    [J]. 2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1496 - 1504
  • [6] Multi-Modal Multi-Task Learning for Automatic Dietary Assessment
    Liu, Qi
    Zhang, Yue
    Liu, Zhenguang
    Yuan, Ye
    Cheng, Li
    Zimmermann, Roger
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2347 - 2354
  • [7] Multi-modal microblog classification via multi-task learning
    Zhao, Sicheng
    Yao, Hongxun
    Zhao, Sendong
    Jiang, Xuesong
    Jiang, Xiaolei
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 8921 - 8938
  • [8] VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
    Zhou, Wangchunshu
    Zeng, Yan
    Diao, Shizhe
    Zhang, Xinsong
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [9] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    [J]. NEUROCOMPUTING, 2024, 570
  • [10] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
    Fan, Dinghao
    Lu, Hengjie
    Xu, Shugong
    Cao, Shan
    [J]. IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036