Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning

被引:2
|
作者
Nguyen, Andre T. [1 ,2 ]
Richards, Luke E. [1 ,2 ]
Kebe, Gaoussou Youssouf [2 ]
Raff, Edward [1 ,2 ]
Darvish, Kasra [2 ]
Ferraro, Frank [2 ]
Matuszek, Cynthia [2 ]
机构
[1] Booz Allen Hamilton, Mclean, VA 22102 USA
[2] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA
关键词
D O I
10.1109/CVPRW53098.2021.00177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Pro-crustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.
引用
收藏
页码:1613 / 1622
页数:10
相关论文
共 50 条
  • [21] Neural entity alignment with cross-modal supervision
    Su, Fenglong
    Xu, Chengjin
    Yang, Han
    Chen, Zhongwu
    Jing, Ning
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
  • [22] Adequate alignment and interaction for cross-modal retrieval
    Mingkang WANG
    Min MENG
    Jigang LIU
    Jigang WU
    虚拟现实与智能硬件(中英文), 2023, 5 (06) : 509 - 522
  • [23] Cross-Modal Translation and Alignment for Survival Analysis
    Zhou, Fengtao
    Chen, Hao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21428 - 21437
  • [24] Robust cross-modal retrieval with alignment refurbishment
    Guo, Jinyi
    Ding, Jieyu
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (10) : 1403 - 1415
  • [25] Cross-modal Collaborative Manifold Propagation for Image Recommendation
    Jian, Meng
    Jia, Ting
    Yang, Xun
    Wu, Lifang
    Huo, Lina
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 344 - 348
  • [26] Hetero-Manifold Regularisation for Cross-Modal Hashing
    Zheng, Feng
    Tang, Yi
    Shao, Ling
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1059 - 1071
  • [27] XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
    Sarkar, Pritam
    Etemad, Ali
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14875 - 14885
  • [28] Text-based person search via cross-modal alignment learning
    Ke, Xiao
    Liu, Hao
    Xu, Peirong
    Lin, Xinru
    Guo, Wenzhong
    PATTERN RECOGNITION, 2024, 152
  • [29] Learning Shared Semantic Space with Correlation Alignment for Cross-Modal Event Retrieval
    Yang, Zhenguo
    Lin, Zehang
    Kang, Peipei
    Lv, Jianming
    Li, Qing
    Liu, Wenyin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [30] Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning
    Wang, Xiaofan
    Li, Xiuhong
    Li, Zhe
    Zhou, Chenyu
    Chen, Fan
    Yang, Dan
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 541 - 554