Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning

被引：2

作者：

Nguyen, Andre T. ^{[1
,2
]}

Richards, Luke E. ^{[1
,2
]}

Kebe, Gaoussou Youssouf ^{[2
]}

Raff, Edward ^{[1
,2
]}

Darvish, Kasra ^{[2
]}

Ferraro, Frank ^{[2
]}

Matuszek, Cynthia ^{[2
]}

机构：

[1] Booz Allen Hamilton, Mclean, VA 22102 USA

[2] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 | 2021年

关键词：

D O I：

10.1109/CVPRW53098.2021.00177

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Pro-crustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.

引用

页码：1613 / 1622

页数：10

共 50 条

[21] Neural entity alignment with cross-modal supervision
Su, Fenglong
Xu, Chengjin
Yang, Han
Chen, Zhongwu
Jing, Ning
INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (02)
[22] Adequate alignment and interaction for cross-modal retrieval
Mingkang WANG
Min MENG
Jigang LIU
Jigang WU
虚拟现实与智能硬件(中英文), 2023, 5 (06) : 509 - 522
[23] Cross-Modal Translation and Alignment for Survival Analysis
Zhou, Fengtao
Chen, Hao
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21428 - 21437
[24] Robust cross-modal retrieval with alignment refurbishment
Guo, Jinyi
Ding, Jieyu
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2023, 24 (10) : 1403 - 1415
[25] Cross-modal Collaborative Manifold Propagation for Image Recommendation
Jian, Meng
Jia, Ting
Yang, Xun
Wu, Lifang
Huo, Lina
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 344 - 348
[26] Hetero-Manifold Regularisation for Cross-Modal Hashing
Zheng, Feng
Tang, Yi
Shao, Ling
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (05) : 1059 - 1071
[27] XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Sarkar, Pritam
Etemad, Ali
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14875 - 14885
[28] Text-based person search via cross-modal alignment learning
Ke, Xiao
Liu, Hao
Xu, Peirong
Lin, Xinru
Guo, Wenzhong
PATTERN RECOGNITION, 2024, 152
[29] Learning Shared Semantic Space with Correlation Alignment for Cross-Modal Event Retrieval
Yang, Zhenguo
Lin, Zehang
Kang, Peipei
Lv, Jianming
Li, Qing
Liu, Wenyin
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
[30] Enhancing Cross-Modal Alignment in Multimodal Sentiment Analysis via Prompt Learning
Wang, Xiaofan
Li, Xiuhong
Li, Zhe
Zhou, Chenyu
Chen, Fan
Yang, Dan
PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 541 - 554

← 1 2 3 4 5 →