Global-to-Contextual Shared Semantic Learning for Fine-Grained Vision-Language Alignment

被引:0
|
作者
Zheng, Min [1 ]
Wu, Chunpeng [1 ]
Qin, Jiaqi [1 ]
Liu, Weiwei [1 ]
Chen, Ming [2 ]
Lin, Long [1 ]
Zhou, Fei [1 ]
机构
[1] State Grid Smart Grid Res Inst Co Ltd, State Grid Lab Grid Adv Comp & Applicat, Beijing 102209, Peoples R China
[2] Xiamen Power Supply Co, State Grid Fujian Elect Power Co, Xiamen 361004, Peoples R China
关键词
Fine-grained vision-language alignment; Shared semantic learning; Global-to-contextual feature representation;
D O I
10.1007/978-3-031-44198-1_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The primary requisites of fine-grained vision-language alignment focus on learning effective features to discriminate fine-grained sub-categories and aligning heterogeneous data. This paper proposes a global-to-contextual shared semantic learning for fine-grained vision-language alignment method to address the above challenges. Precisely, to enhance the discrimination of features inside intra-modality, this method extracts the global and contextual vision and language features and carries out features joint learning. Further, this method constructs a shared semantic space, which bridges the semantic correlation of heterogeneous data. Extensive experiments demonstrate the effectiveness of our approach.
引用
收藏
页码:281 / 293
页数:13
相关论文
共 50 条
  • [11] Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models
    Ma, Zheng
    Pan, Mianzhi
    Wu, Wenhan
    Cheng, Kanzhi
    Zhang, Jianbing
    Huang, Shujian
    Chen, Jiajun
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5674 - 5685
  • [12] Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision
    He, Keji
    Huang, Yan
    Wu, Qi
    Yang, Jianhua
    An, Dong
    Sima, Shuanglin
    Wang, Liang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [13] A fine-grained vision and language representation framework with graph-based fashion semantic knowledge
    Ding, Huiming
    Wang, Sen
    Xie, Zhifeng
    Li, Mengtian
    Ma, Lizhuang
    [J]. COMPUTERS & GRAPHICS-UK, 2023, 115 : 216 - 225
  • [14] Semantic interaction learning for fine-grained vehicle recognition
    Zhang, Jingjing
    Lei, Jingsheng
    Yang, Shengying
    Yang, Xinqi
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2022, 33 (01)
  • [15] Fine-grained Image Classification via Combining Vision and Language
    He, Xiangteng
    Peng, Yuxin
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7332 - 7340
  • [16] Measuring Progress in Fine-grained Vision-and-Language Understanding
    Bugliarello, Emanuele
    Sartran, Laurent
    Agrawal, Aishwarya
    Hendricks, Lisa Anne
    Nematzadeh, Aida
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1559 - 1582
  • [17] Improved Semantic-Aware Network Embedding with Fine-Grained Word Alignment
    Shen, Dinghan
    Zhang, Xinyuan
    Henao, Ricardo
    Carin, Lawrence
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1829 - 1838
  • [18] Semantic-Guided Information Alignment Network for Fine-Grained Image Recognition
    Wang, Shijie
    Wang, Zhihui
    Li, Haojie
    Chang, Jianlong
    Ouyang, Wanli
    Tian, Qi
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6558 - 6570
  • [19] Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning
    Zhu, Minghao
    Lin, Xiao
    Dang, Ronghao
    Liu, Chengju
    Chen, Qijun
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4725 - 4736
  • [20] Fine-Grained Representation Learning and Recognition by Exploiting Hierarchical Semantic Embedding
    Chen, Tianshui
    Wu, Wenxi
    Gao, Yuefang
    Dong, Le
    Luo, Xiaonan
    Lin, Liang
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 2023 - 2031