Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

被引:13
|
作者
Hao, Cong [1 ]
Chen, Deming [2 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Univ Illinois, Urbana, IL USA
关键词
D O I
10.1109/AICAS51828.2021.9458577
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Optimizing the quality of result (QoR) and the quality of service (QoS) of AI-empowered autonomous systems simultaneously is very challenging. First, there are multiple input sources, e.g., multi-modal data from different sensors, requiring diverse data preprocessing, sensor fusion, and feature aggregation. Second, there are multiple tasks that require various AI models to run simultaneously, e.g., perception, localization, and control. Third, the computing and control system is heterogeneous, composed of hardware components with varied features, such as embedded CPUs, GPUs, FPGAs, and dedicated accelerators. Therefore, autonomous systems essentially require multi-modal multi-task (MMMT) learning which must be aware of hardware performance and implementation strategies. While MMMT learning has been attracting intensive research interests, its applications in autonomous systems are still underexplored. In this paper, we first discuss the opportunities of applying MMMT techniques in autonomous systems, and then discuss the unique challenges that must be solved. In addition, we discuss the necessity and opportunities of MMMT model and hardware co-design, which is critical for autonomous systems especially with power/resource-limited or heterogeneous platforms. We formulate the MMMT model and heterogeneous hardware implementation co-design as a differentiable optimization problem, with the objective of improving the solution quality and reducing the overall power consumption and critical path latency. We advocate for further explorations of MMMT in autonomous systems and software/hardware co-design solutions.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] MultiNet: Multi-Modal Multi-Task Learning for Autonomous Driving
    Chowdhuri, Sauhaarda
    Pankaj, Tushar
    Zipser, Karl
    2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1496 - 1504
  • [2] Multi-modal microblog classification via multi-task learning
    Sicheng Zhao
    Hongxun Yao
    Sendong Zhao
    Xuesong Jiang
    Xiaolei Jiang
    Multimedia Tools and Applications, 2016, 75 : 8921 - 8938
  • [3] Multi-modal microblog classification via multi-task learning
    Zhao, Sicheng
    Yao, Hongxun
    Zhao, Sendong
    Jiang, Xuesong
    Jiang, Xiaolei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 8921 - 8938
  • [4] Multi-Modal Multi-Task Learning for Automatic Dietary Assessment
    Liu, Qi
    Zhang, Yue
    Liu, Zhenguang
    Yuan, Ye
    Cheng, Li
    Zimmermann, Roger
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2347 - 2354
  • [5] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
    Fan, Dinghao
    Lu, Hengjie
    Xu, Shugong
    Cao, Shan
    IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036
  • [6] Multi-modal embeddings using multi-task learning for emotion recognition
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    INTERSPEECH 2020, 2020, : 384 - 388
  • [7] A Multi-modal Sentiment Recognition Method Based on Multi-task Learning
    Lin, Zijie
    Long, Yunfei
    Du, Jiachen
    Xu, Ruifeng
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57 (01): : 7 - 15
  • [8] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
    Akhtar, Md Shad
    Chauhan, Dushyant Singh
    Ghosal, Deepanway
    Poria, Soujanya
    Ekbal, Asif
    Bhattacharyya, Pushpak
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379
  • [9] MultiMAE: Multi-modal Multi-task Masked Autoencoders
    Bachmann, Roman
    Mizrahi, David
    Atanov, Andrei
    Zamir, Amir
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 348 - 367
  • [10] Twitter Demographic Classification Using Deep Multi-modal Multi-task Learning
    Vijayaraghavan, Prashanth
    Vosoughi, Soroush
    Roy, Deb
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 478 - 483