M3S: Scene Graph Driven Multi-Granularity Multi-Task Learning for Multi-Modal NER

被引:18
|
作者
Wang, Jie [1 ,2 ]
Yang, Yan [1 ,2 ]
Liu, Keyu [1 ,2 ]
Zhu, Zhiping [1 ,2 ]
Liu, Xiaorong [1 ,2 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Southwest Jiaotong Univ, Mfg Ind Chains Collaborat & Informat Support Tech, Chengdu 611756, Peoples R China
基金
中国国家自然科学基金;
关键词
Visualization; Task analysis; Semantics; Feature extraction; Multitasking; Social networking (online); Image segmentation; Named entity recognition; multi-modal learning; scene graph;
D O I
10.1109/TASLP.2022.3221017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Multi-modal Named Entity Recognition (MNER), which mainly focuses on enhancing text-only NER with visual information, has recently attracted considerable attention. Most current MNER models have made significant progress by jointly understanding visual and language modalities through layers of cross-modality attention. However, these approaches largely ignore the visual bias brought by the image contents and barely consider exploiting the multi-granularity representations and the interactions between visual objects, which are essential in recognizing ambiguous entities. In this paper, we propose a Scene graph driven Multi-modal Multi-granularity Multi-task learning (M3S) framework to better exploit visual and textual information in MNER. Specifically, to explicitly alleviate visual bias, we present a novel multi-task approach by employing the task of Named Entity Segmentation (NES) cascade with Named Entity Categorization (NEC). To obtain detailed visual semantics by explicitly modeling objects and relationships between paired objects, we construct scene graphs as a structured representation of the visual contents. Furthermore, a well-designed Multi-granularity Gated Aggregation (MGA) mechanism is introduced to capture inter-modality interactions and extract critical features for named entity recognition. Extensive experiments on two real public datasets demonstrate the effectiveness of our proposed M3S.
引用
收藏
页码:111 / 120
页数:10
相关论文
共 50 条
  • [41] Gaining Extra Supervision via Multi-task learning for Multi-Modal Video Question Answering
    Kim, Junyeong
    Ma, Minuk
    Kim, Kyungsu
    Kim, Sungjin
    Yoo, Chang D.
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [42] Multi-modal Sentiment and Emotion Joint Analysis with a Deep Attentive Multi-task Learning Model
    Zhang, Yazhou
    Rong, Lu
    Li, Xiang
    Chen, Rui
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 518 - 532
  • [43] Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems
    Hao, Cong
    Chen, Deming
    2021 IEEE 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE CIRCUITS AND SYSTEMS (AICAS), 2021,
  • [44] Multi-task Learning of Semantic Segmentation and Height Estimation for Multi-modal Remote Sensing Images
    Mengyu WANG
    Zhiyuan YAN
    Yingchao FENG
    Wenhui DIAO
    Xian SUN
    Journal of Geodesy and Geoinformation Science, 2023, 6 (04) : 27 - 39
  • [45] STARS: Soft Multi-Task Learning for Activity Recognition from Multi-Modal Sensor Data
    Liu, Xi
    Tan, Pang-Ning
    Liu, Lei
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 569 - 581
  • [46] M3T-LM: A multi-modal multi-task learning model for jointly predicting patient length of stay and mortality
    Chen, Junde
    Li, Qing
    Liu, Feng
    Wen, Yuxin
    Computers in Biology and Medicine, 2024, 183
  • [47] MULTI-MODAL MULTI-TASK LEARNING FOR SEMANTIC SEGMENTATION OF LAND COVER UNDER CLOUDY CONDITIONS
    Xu, Fang
    Shi, Yilei
    Yang, Wen
    Zhu, Xiaoxiang
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6274 - 6277
  • [48] A multi-modal fusion framework based on multi-task correlation learning for cancer prognosis prediction
    Tan, Kaiwen
    Huang, Weixian
    Liu, Xiaofeng
    Hu, Jinlong
    Dong, Shoubin
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2022, 126
  • [49] Fast Multi-Task SCCA Learning with Feature Selection for Multi-Modal Brain Imaging Genetics
    Du, Lei
    Liu, Kefei
    Yao, Xiaohui
    Risacher, Shannon L.
    Han, Junwei
    Guo, Lei
    Saykin, Andrew J.
    Shen, Li
    Weiner, Michael
    Aisen, Paul
    Petersen, Ronald
    Jack, Clifford R., Jr.
    Jagust, William
    Trojanowki, John Q.
    Toga, Arthur W.
    Beckett, Laurel
    Green, Robert C.
    Saykin, Andrew J.
    Morris, John
    Liu, Enchi
    Montine, Tom
    Gamst, Anthony
    Thomas, Ronald G.
    Donohue, Michael
    Walter, Sarah
    Gessert, Devon
    Sather, Tamie
    Harvey, Danielle
    Kornak, John
    Dale, Anders
    Bernstein, Matthew
    Felmlee, Joel
    Fox, Nick
    Thompson, Paul
    Schuff, Norbert
    Alexander, Gene
    DeCarli, Charles
    Bandy, Dan
    Koeppe, Robert A.
    Foster, Norm
    Reiman, Eric M.
    Chen, Kewei
    Mathis, Chet
    Cairns, Nigel J.
    Taylor-Reinwald, Lisa
    Shaw, Les
    Lee, Virginia M. Y.
    Korecka, Magdalena
    Crawford, Karen
    Neu, Scott
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 356 - 361
  • [50] VEMO: A Versatile Elastic Multi-modal Model for Search-Oriented Multi-task Learning
    Fei, Nanyi
    Jiang, Hao
    Lu, Haoyu
    Long, Jinqiang
    Dai, Yanqi
    Fan, Tuo
    Cao, Zhao
    Lu, Zhiwu
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 56 - 72