Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment

被引:0
|
作者
Chen, Zhihao [1 ]
Zhou, Yang [2 ]
Tran, Anh [3 ]
Zhao, Junting [1 ]
wan, Liang [1 ]
Ooi, Gideon Su Kai [4 ]
Cheng, Lionel Tim-Ee [3 ]
Thng, Choon Hua [4 ]
Xu, Xinxing [2 ]
Liu, Yong [2 ]
Fu, Huazhu [2 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] ASTAR, Inst High Performance Comp IHPC, 1 Fusionopolis Way 16-16 Connexis, Singapore 138632, Singapore
[3] Singapore Gen Hosp, Singapore, Singapore
[4] Natl Canc Ctr Singapore, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
Medical phrase grounding; vision-language model; contrastive learning;
D O I
10.1007/978-3-031-43990-2_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical phrase grounding (MPG) aims to locate the most relevant region in a medical image, given a phrase query describing certain medical findings, which is an important task for medical image analysis and radiological diagnosis. However, existing visual grounding methods rely on general visual features for identifying objects in natural images and are not capable of capturing the subtle and specialized features of medical findings, leading to a sub-optimal performance in MPG. In this paper, we propose MedRPG, an end-to-end approach for MPG. MedRPG is built on a lightweight vision-language transformer encoder and directly predicts the box coordinates of mentioned medical findings, which can be trained with limited medical data, making it a valuable tool in medical image analysis. To enable MedRPG to locate nuanced medical findings with better region-phrase correspondences, we further propose Tri-attention Context contrastive alignment (TaCo). TaCo seeks context alignment to pull both the features and attention outputs of relevant region-phrase pairs close together while pushing those of irrelevant regions far away. This ensures that the final box prediction depends more on its finding-specific regions and phrases. Experimental results on three MPG datasets demonstrate that our MedRPG outperforms state-of-the-art visual grounding approaches by a large margin. Additionally, the proposed TaCo strategy is effective in enhancing finding localization ability and reducing spurious region-phrase correlations.
引用
收藏
页码:371 / 381
页数:11
相关论文
共 50 条
  • [31] HMM word and phrase alignment for statistical machine translation
    Deng, Yonggang
    Byrne, William
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (03): : 494 - 507
  • [32] GloBuddy, a dynamic broad context phrase book
    Musa, R
    Scheidegger, M
    Kulas, A
    Anguilet, Y
    [J]. MODELING AND USING CONTEXT, PROCEEDINGS, 2003, 2680 : 467 - 474
  • [33] TEMPORAL INTERACTIONS WITHIN A PHRASE AND SENTENCE CONTEXT
    WRIGHT, TW
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 56 (04): : 1258 - 1265
  • [34] Effective phrase translation extraction from alignment models
    Venugopal, A
    Vogel, S
    Waibel, A
    [J]. 41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 319 - 326
  • [35] Phrase Grounding by Soft-Label Chain Conditional Random Field
    Liu, Jiacheng
    Hockenmaier, Julia
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5112 - 5122
  • [36] Utilizing Every Image Object for Semi-supervised Phrase Grounding
    Zhu, Haidong
    Sadhu, Arka
    Zheng, Zhaoheng
    Nevatia, Ram
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 2209 - 2218
  • [37] Do contrastive accents modulate the effect of intonational phrase boundaries in parsing?
    Lee, Eun-Kyung
    Garnsey, Susan M.
    [J]. LINGUA, 2012, 122 (14) : 1763 - 1775
  • [38] THE PHRASE STRUCTURE OF PHASE VERBS: AN INITIAL CONTRASTIVE ANALYSIS OF ENGLISH AND RUSSIAN
    MacDonald, Jonathan E.
    [J]. ACTA LINGUISTICA HUNGARICA, 2011, 58 (03) : 261 - 276
  • [39] CONTEXT-FREE GRAMMARS WITH UNIQUE PHRASE STRUCTURE
    TOKURA, N
    KASAMI, T
    [J]. ELECTRONICS & COMMUNICATIONS IN JAPAN, 1968, 51 (06): : 119 - &
  • [40] Phrase-based alignment models for statistical machine translation
    Tomás, J
    Lloret, J
    Casacuberta, F
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 2, PROCEEDINGS, 2005, 3523 : 605 - 613