VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning

被引：0

作者：

Fei, Nanyi ^{[1
]}

Jiang, Hao ^{[2
]}

Lu, Haoyu ^{[3
]}

Long, Jinqiang ^{[3
]}

Dai, Yanqi ^{[3
]}

Fan, Tuo ^{[2
]}

Cao, Zhao ^{[2
]}

Lu, Zhiwu ^{[3
]}

机构：

[1] Renmin Univ China, Sch Informat, Beijing, Peoples R China

[2] Huawei Poisson Lab, Hangzhou, Zhejiang, Peoples R China

[3] Renmin Univ China, Gaoling Sch Artificial Intelligence, Beijing, Peoples R China

来源：

ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I | 2024年 / 14608卷

基金：

中国国家自然科学基金;

关键词：

multi-modal model; multi-task learning; cross-modal search;

D O I：

10.1007/978-3-031-56027-9_4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Cross-modal search is one fundamental task in multi-modal learning, but there is hardly any work that aims to solve multiple cross-modal search tasks at once. In this work, we propose a novel Versatile Elastic Multi-mOdal (VEMO) model for search-oriented multi-task learning. VEMO is versatile because we integrate cross-modal semantic search, named entity recognition, and scene text spotting into a unified framework, where the latter two can be further adapted to entity- and character-based image search tasks. VEMO is also elastic because we can freely assemble sub-modules of our flexible network architecture for corresponding tasks. Moreover, to give more choices on the effect-efficiency trade-off when performing cross-modal semantic search, we place multiple encoder exits. Experimental results show the effectiveness of our VEMO with only 37.6% network parameters compared to those needed for uni-task training. Further evaluations on entity- and character-based image search tasks also validate the superiority of search-oriented multi-task learning.

引用

页码：56 / 72

页数：17

共 50 条

[1] MultiNet: Multi-Modal Multi-Task Learning for Autonomous Driving
Chowdhuri, Sauhaarda
Pankaj, Tushar
Zipser, Karl
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, : 1496 - 1504
[2] Multi-modal microblog classification via multi-task learning
Sicheng Zhao
Hongxun Yao
Sendong Zhao
Xuesong Jiang
Xiaolei Jiang
Multimedia Tools and Applications, 2016, 75 : 8921 - 8938
[3] Multi-modal microblog classification via multi-task learning
Zhao, Sicheng
Yao, Hongxun
Zhao, Sendong
Jiang, Xuesong
Jiang, Xiaolei
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 8921 - 8938
[4] Multi-Modal Multi-Task Learning for Automatic Dietary Assessment
Liu, Qi
Zhang, Yue
Liu, Zhenguang
Yuan, Ye
Cheng, Li
Zimmermann, Roger
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2347 - 2354
[5] Multi-Task and Multi-Modal Learning for RGB Dynamic Gesture Recognition
Fan, Dinghao
Lu, Hengjie
Xu, Shugong
Cao, Shan
IEEE SENSORS JOURNAL, 2021, 21 (23) : 27026 - 27036
[6] Multi-modal embeddings using multi-task learning for emotion recognition
Khare, Aparna
Parthasarathy, Srinivas
Sundaram, Shiva
INTERSPEECH 2020, 2020, : 384 - 388
[7] A Multi-modal Sentiment Recognition Method Based on Multi-task Learning
Lin, Zijie
Long, Yunfei
Du, Jiachen
Xu, Ruifeng
Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2021, 57 (01): : 7 - 15
[8] Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis
Akhtar, Md Shad
Chauhan, Dushyant Singh
Ghosal, Deepanway
Poria, Soujanya
Ekbal, Asif
Bhattacharyya, Pushpak
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 370 - 379
[9] Multi-task Classification Model Based On Multi-modal Glioma Data
Li, Jialun
Jin, Yuanyuan
Yu, Hao
Wang, Xiaoling
Zhuang, Qiyuan
Chen, Liang
11TH IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH (ICKG 2020), 2020, : 165 - 172
[10] Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
Zhang, Yazhou
Rong, Lu
Li, Xiang
Chen, Rui
ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 518 - 532

← 1 2 3 4 5 →