VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引：0

作者：

Fei, Nanyi ^{[1
]}

Jiang, Hao ^{[2
]}

Lu, Haoyu ^{[3
]}

Long, Jinqiang ^{[3
]}

Dai, Yanqi ^{[3
]}

Fan, Tuo ^{[2
]}

Cao, Zhao ^{[2
]}

Lu, Zhiwu ^{[3
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China

[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

来源：

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I | 2024年 / 14608卷

基金：

中国国家自然科学基金;

关键词：

multi-modal model; multi-task learning; cross-modal search;

D O I：

10.1007/978-3-031-56027-9_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.

引用

页码：56 / 72

页数：17

共 50 条

[31] MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis
Zhang, Ziye
Yin, Wendong
Wang, Shijin
Zheng, Xiaorou
Dong, Shoubin
Computers in Biology and Medicine, 2024, 181
[32] Align vision-language semantics by multi-task learning for multi-modal summarization
Cui C.
Liang X.
Wu S.
Li Z.
Neural Computing and Applications, 2024, 36 (25) : 15653 - 15666
[33] Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering
Kim, Junyeong
Ma, Minuk
Kim, Kyungsu
Kim, Sungjin
Yoo, Chang D.
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[34] Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems
Hao, Cong
Chen, Deming
2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
[35] Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images
Mengyu WANG
Zhiyuan YAN
Yingchao FENG
Wenhui DIAO
Xian SUN
Journal of Geodesy and Geoinformation Science, 2023, 6 (04) : 27 - 39
[36] STARS: Soft Multi-Task Learning for Activity Recognition from Multi-Modal Sensor Data
Liu, Xi
Tan, Pang-Ning
Liu, Lei
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 569 - 581
[37] MULTI-MODAL MULTI-TASK LEARNING FOR SEMANTIC SEGMENTATION OF LAND COVER UNDER CLOUDY CONDITIONS
Xu, Fang
Shi, Yilei
Yang, Wen
Zhu, Xiaoxiang
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6274 - 6277
[38] A multi-modal fusion framework based on multi-task correlation learning for cancer prognosis prediction
Tan, Kaiwen
Huang, Weixian
Liu, Xiaofeng
Hu, Jinlong
Dong, Shoubin
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2022, 126
[39] Multi-Modal Multi-Task Learning for Joint Prediction of Clinical Scores in Alzheimer's Disease
Zhang, Daoqiang
Shen, Dinggang
MULTIMODAL BRAIN IMAGE ANALYSIS, 2011, 7012 : 60 - 67
[40] Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics
Du, Lei
Liu, Kefei
Yao, Xiaohui
Risacher, Shannon L.
Han, Junwei
Guo, Lei
Saykin, Andrew J.
Shen, Li
Weiner, Michael
Aisen, Paul
Petersen, Ronald
Jack, Clifford R., Jr.
Jagust, William
Trojanowki, John Q.
Toga, Arthur W.
Beckett, Laurel
Green, Robert C.
Saykin, Andrew J.
Morris, John
Liu, Enchi
Montine, Tom
Gamst, Anthony
Thomas, Ronald G.
Donohue, Michael
Walter, Sarah
Gessert, Devon
Sather, Tamie
Harvey, Danielle
Kornak, John
Dale, Anders
Bernstein, Matthew
Felmlee, Joel
Fox, Nick
Thompson, Paul
Schuff, Norbert
Alexander, Gene
DeCarli, Charles
Bandy, Dan
Koeppe, Robert A.
Foster, Norm
Reiman, Eric M.
Chen, Kewei
Mathis, Chet
Cairns, Nigel J.
Taylor-Reinwald, Lisa
Shaw, Les
Lee, Virginia M. Y.
Korecka, Magdalena
Crawford, Karen
Neu, Scott
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 356 - 361

← 1 2 3 4 5 →