Towards real-time embodied AI agent: a bionic visual encoding framework for mobile robotics

被引:0
|
作者
Hou, Xueyu [1 ]
Guan, Yongjie [1 ]
Han, Tao [2 ]
Wang, Cong [2 ]
机构
[1] Univ Maine, ECE Dept, Orono, ME 04469 USA
[2] New Jersey Inst Technol, ECE Dept, Newark, NJ USA
关键词
Mobile robotics; Visual encoding; Embodied AI; Computer vision; ICONIC MEMORY;
D O I
10.1007/s41315-024-00363-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Embodied artificial intelligence (AI) agents, which navigate and interact with their environment using sensors and actuators, are being applied for mobile robotic platforms with limited computing power, such as autonomous vehicles, drones, and humanoid robots. These systems make decisions through environmental perception from deep neural network (DNN)-based visual encoders. However, the constrained computational resources and the large amounts of visual data to be processed can create bottlenecks, such as taking almost 300 milliseconds per decision on an embedded GPU board (Jetson Xavier). Existing DNN acceleration methods need model retraining and can still reduce accuracy. To address these challenges, our paper introduces a bionic visual encoder framework, }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document}, to support real-time requirements of embodied AI agents. The proposed framework complements existing DNN acceleration techniques. Specifically, we integrate motion data to identify overlapping areas between consecutive frames, which reduces DNN workload by propagating encoding results. We bifurcate processing into high-resolution for task-critical areas and low-resolution for less-significant regions. This dual-resolution approach allows us to maintain task performance while lowering the overall computational demands. We evaluate }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} across three robotic scenarios: autonomous driving, vision-and-language navigation, and drone navigation, using various DNN models and mobile platforms. }Robye\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathsf \small {Robye}$$\end{document} outperforms baselines in speed (1.2-3. 3 x\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}), performance (+4%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+4\%$$\end{document} to +29%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$+29\%$$\end{document}), and power consumption (-36%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-36\%$$\end{document} to -47%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-47\%$$\end{document}).
引用
下载
收藏
页码:1038 / 1056
页数:19
相关论文
共 50 条
  • [1] Real-time support for mobile robotics
    Li, H
    Sweeney, J
    Ramamritham, K
    Grupen, R
    Shenoy, P
    9TH IEEE REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, PROCEEDINGS, 2003, : 10 - 18
  • [2] Towards a real-time framework for visual monitoring tasks
    Garcia, LM
    Grupen, RA
    THIRD IEEE INTERNATIONAL WORKSHOP ON VISUAL SURVEILLANCE, PROCEEDINGS, 2000, : 47 - 55
  • [3] LSAVISION a framework for real time vision mobile robotics
    Silva, Hugo
    Almeida, J. M.
    Lima, Luis
    Martins, A.
    Silva, E. P.
    Patacho, A.
    COMPUTATIONAL MODELLING OF OBJECTS REPRESENTED IN IMAGES: FUNDAMENTALS, METHODS AND APPLICATIONS, 2007, : 411 - 416
  • [4] Resource management for real-time tasks in mobile robotics
    Li, Huan
    Ramamritham, Krithi
    Shenoy, Prashant
    Grupen, Roderic A.
    Sweeney, John D.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2007, 80 (07) : 962 - 971
  • [5] Real-Time Schedule for Mobile Robotics and WSN Aplications
    Chovanec, Michal
    Sarafin, Peter
    PROCEEDINGS OF THE 2015 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2015, 5 : 1199 - 1202
  • [6] Neuronal encoding of visual motion in real-time
    Warzecha, AK
    Egelhaaf, M
    MOTION VISION: COMPUTATIONAL, NEURAL, AND ECOLOGICAL CONSTRAINTS, 2001, : 239 - 277
  • [7] Dynamic aspects of visual servoing and a framework for real-time 3D vision for robotics
    Vincze, M
    Ayromlou, M
    Chroust, S
    Zillich, M
    Ponweiser, W
    Legenstein, D
    SENSOR BASED INTELLIGENT ROBOTS, 2002, 2238 : 101 - 121
  • [8] Towards Real-Time Trinocular Visual Odometry
    Jeong, Jaeheon
    Correll, Nikolaus
    MECHANICAL DESIGN AND POWER ENGINEERING, PTS 1 AND 2, 2014, 490-491 : 1424 - 1429
  • [9] A Computational Model for Managing Impressions of an Embodied Conversational Agent in Real-Time
    Biancardi, Beatrice
    Wang, Chen
    Mancini, Maurizio
    Cafaro, Angelo
    Chanel, Guillaume
    Pelachaud, Catherine
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [10] A Real-Time Visual Card Reader for Mobile Devices
    Stehr, Lukas
    Meusel, Robert
    Kopf, Stephan
    2016 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2016), 2016,