JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models

被引:0
|
作者
Wang, Zihao [1 ]
Cai, Shaofei [1 ]
Liu, Anji [2 ]
Jin, Yonggang [3 ]
Hou, Jinbing [3 ]
Zhang, Bowei [1 ]
Lin, Haowei [1 ]
He, Zhaofeng [3 ]
Zheng, Zilong [4 ]
Yang, Yaodong [1 ]
Ma, Xiaojian [4 ]
Liang, Yitao [1 ]
机构
[1] PKU, United Kingdom
[2] UCLA, United States
[3] BUPT
[4] BIGAI
来源
arXiv | 2023年
关键词
Engineering Village;
D O I
暂无
中图分类号
学科分类号
摘要
Human like - Language model - Multi tasks - Multi-modal - Multimodal inputs - Open world - Planning and control - Task agents - Time progress - Visual observations
引用
收藏
相关论文
共 6 条
  • [1] JARVIS-1: Open-World Multi-Task Agents With Memory-Augmented Multimodal Language Models
    Wang, Zihao
    Cai, Shaofei
    Liu, Anji
    Jin, Yonggang
    Hou, Jinbing
    Zhang, Bowei
    Lin, Haowei
    He, Zhaofeng
    Zheng, Zilong
    Yang, Yaodong
    Ma, Xiaojian
    Liang, Yitao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1894 - 1907
  • [2] Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
    Wang, Zihao
    Cai, Shaofei
    Chen, Guanzhou
    Liu, Anji
    Ma, Xiaojian
    Liang, Yitao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
    Sarch, Gabriel
    Wu, Yue
    Tarr, Michael J.
    Fragkiadaki, Katerina
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3468 - 3500
  • [4] SIM: Open-World Multi-Task Stream Classifier with Integral Similarity Metrics
    Gao, Yang
    Li, Yi-Fan
    Dong, Bo
    Lin, Yu
    Khan, Latifur
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 751 - 760
  • [5] Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction
    Cai, Shaofei
    Wang, Zihao
    Ma, Xiaojian
    Liu, Anji
    Liang, Yitao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13734 - 13744
  • [6] RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements
    Zhan, Zaifu
    Zhou, Shuang
    Li, Mingchen
    Zhang, Rui
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2025, 32 (03) : 545 - 554