VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引:0
|
作者
Fei, Nanyi [1 ]
Jiang, Hao [2 ]
Lu, Haoyu [3 ]
Long, Jinqiang [3 ]
Dai, Yanqi [3 ]
Fan, Tuo [2 ]
Cao, Zhao [2 ]
Lu, Zhiwu [3 ]
机构
[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China
[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China
[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-modal model; multi-task learning; cross-modal search;
D O I
10.1007/978-3-031-56027-9_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.
引用
收藏
页码:56 / 72
页数:17
相关论文
共 50 条
  • [31] MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis
    Zhang, Ziye
    Yin, Wendong
    Wang, Shijin
    Zheng, Xiaorou
    Dong, Shoubin
    Computers in Biology and Medicine, 2024, 181
  • [32] Align vision-language semantics by multi-task learning for multi-modal summarization
    Cui C.
    Liang X.
    Wu S.
    Li Z.
    Neural Computing and Applications, 2024, 36 (25) : 15653 - 15666
  • [33] Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering
    Kim, Junyeong
    Ma, Minuk
    Kim, Kyungsu
    Kim, Sungjin
    Yoo, Chang D.
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [34] Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems
    Hao, Cong
    Chen, Deming
    2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
  • [35] Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images
    Mengyu WANG
    Zhiyuan YAN
    Yingchao FENG
    Wenhui DIAO
    Xian SUN
    Journal of Geodesy and Geoinformation Science, 2023, 6 (04) : 27 - 39
  • [36] STARS: Soft Multi-Task Learning for Activity Recognition from Multi-Modal Sensor Data
    Liu, Xi
    Tan, Pang-Ning
    Liu, Lei
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 569 - 581
  • [37] MULTI-MODAL MULTI-TASK LEARNING FOR SEMANTIC SEGMENTATION OF LAND COVER UNDER CLOUDY CONDITIONS
    Xu, Fang
    Shi, Yilei
    Yang, Wen
    Zhu, Xiaoxiang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6274 - 6277
  • [38] A multi-modal fusion framework based on multi-task correlation learning for cancer prognosis prediction
    Tan, Kaiwen
    Huang, Weixian
    Liu, Xiaofeng
    Hu, Jinlong
    Dong, Shoubin
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2022, 126
  • [39] Multi-Modal Multi-Task Learning for Joint Prediction of Clinical Scores in Alzheimer's Disease
    Zhang, Daoqiang
    Shen, Dinggang
    MULTIMODAL BRAIN IMAGE ANALYSIS, 2011, 7012 : 60 - 67
  • [40] Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics
    Du, Lei
    Liu, Kefei
    Yao, Xiaohui
    Risacher, Shannon L.
    Han, Junwei
    Guo, Lei
    Saykin, Andrew J.
    Shen, Li
    Weiner, Michael
    Aisen, Paul
    Petersen, Ronald
    Jack, Clifford R., Jr.
    Jagust, William
    Trojanowki, John Q.
    Toga, Arthur W.
    Beckett, Laurel
    Green, Robert C.
    Saykin, Andrew J.
    Morris, John
    Liu, Enchi
    Montine, Tom
    Gamst, Anthony
    Thomas, Ronald G.
    Donohue, Michael
    Walter, Sarah
    Gessert, Devon
    Sather, Tamie
    Harvey, Danielle
    Kornak, John
    Dale, Anders
    Bernstein, Matthew
    Felmlee, Joel
    Fox, Nick
    Thompson, Paul
    Schuff, Norbert
    Alexander, Gene
    DeCarli, Charles
    Bandy, Dan
    Koeppe, Robert A.
    Foster, Norm
    Reiman, Eric M.
    Chen, Kewei
    Mathis, Chet
    Cairns, Nigel J.
    Taylor-Reinwald, Lisa
    Shaw, Les
    Lee, Virginia M. Y.
    Korecka, Magdalena
    Crawford, Karen
    Neu, Scott
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 356 - 361