Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning

被引:2
|
作者
Nguyen, Andre T. [1 ,2 ]
Richards, Luke E. [1 ,2 ]
Kebe, Gaoussou Youssouf [2 ]
Raff, Edward [1 ,2 ]
Darvish, Kasra [2 ]
Ferraro, Frank [2 ]
Matuszek, Cynthia [2 ]
机构
[1] Booz Allen Hamilton, Mclean, VA 22102 USA
[2] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA
关键词
D O I
10.1109/CVPRW53098.2021.00177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Pro-crustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.
引用
收藏
页码:1613 / 1622
页数:10
相关论文
共 50 条
  • [1] Category Alignment Adversarial Learning for Cross-Modal Retrieval
    He, Shiyuan
    Wang, Weiyang
    Wang, Zheng
    Xu, Xing
    Yang, Yang
    Wang, Xiaoming
    Shen, Heng Tao
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) : 4527 - 4538
  • [2] Learning Relation Alignment for Calibrated Cross-modal Retrieval
    Ren, Shuhuai
    Lin, Junyang
    Zhao, Guangxiang
    Men, Rui
    Yang, An
    Zhou, Jingren
    Sun, Xu
    Yang, Hongxia
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 514 - 524
  • [3] Collaboratively Semantic Alignment and Metric Learning for Cross-Modal Hashing
    Li, Jiaxing
    Wong, Wai Keung
    Jiang, Lin
    Jiang, Kaihang
    Fang, Xiaozhao
    Xie, Shengli
    Wen, Jie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2311 - 2328
  • [4] Cross-modal Map Learning for Vision and Language Navigation
    Georgakis, Georgios
    Schmeckpeper, Karl
    Wanchoo, Karan
    Dan, Soham
    Miltsakaki, Eleni
    Roth, Dan
    Daniilidis, Kostas
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15439 - 15449
  • [5] Manifold regularized cross-modal embedding for zero-shot learning
    Ji, Zhong
    Yu, Yunlong
    Pang, Yanwei
    Guo, Jichang
    Zhang, Zhongfei
    INFORMATION SCIENCES, 2017, 378 : 48 - 58
  • [6] LEARNING VISUALLY ALIGNED SEMANTIC GRAPH FOR CROSS-MODAL MANIFOLD MATCHING
    Li, Yanan
    Hu, Huanhang
    Wang, Donghui
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3412 - 3416
  • [7] Cross-Modal Manifold Propagation for Image Recommendation
    Jian, Meng
    Guo, Jingjing
    Fu, Xin
    Wu, Lifang
    Jia, Ting
    APPLIED SCIENCES-BASEL, 2022, 12 (06):
  • [8] Cross-modal alignment and contrastive learning for enhanced cancer survival prediction
    Li, Tengfei
    Zhou, Xuezhong
    Xue, Jingyan
    Zeng, Lili
    Zhu, Qiang
    Wang, Ruiping
    Yu, Haibin
    Xia, Jianan
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2025, 263
  • [9] Prompt Learning with Cross-Modal Feature Alignment for Visual Domain Adaptation
    Liu, Jinxing
    Xiao, Junjin
    Ma, Haokai
    Li, Xiangxian
    Qi, Zhuang
    Meng, Xiangxu
    Meng, Lei
    ARTIFICIAL INTELLIGENCE, CICAI 2022, PT I, 2022, 13604 : 416 - 428
  • [10] LEARNING CONTEXTUAL TAG EMBEDDINGS FOR CROSS-MODAL ALIGNMENT OF AUDIO AND TAGS
    Favory, Xavier
    Drossos, Konstantinos
    Virtanen, Tuomas
    Serra, Xavier
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 596 - 600