Reinforced Cross-modal Alignment for Radiology Report Generation

被引:0
|
作者
Qin, Han [1 ]
Song, Yan [1 ]
机构
[1] Chinese Univ Hong Kong Shenzhen, Shenzhen, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Medical images are widely used in clinical decision-making, where writing radiology reports is a potential application that can be enhanced by automatic solutions to alleviate physicians' workload. In general, radiology report generation is an image-text task, where cross-modal mappings between images and texts play an important role in generating high-quality reports. Although previous studies attempt to facilitate the alignment via the co-attention mechanism under supervised settings, they suffer from lacking valid and accurate correspondences due to no annotation of such alignment. In this paper, we propose an approach with reinforcement learning (RL) over a cross-modal memory (CMM) to better align visual and textual features for radiology report generation. In detail, a shared memory is used to record the mappings between visual and textual information, and the proposed reinforced algorithm is performed to learn the signal from the reports to guide the cross-modal alignment even though such reports are not directly related to how images and texts are mapped. Experimental results on two English radiology report datasets, i.e., IU X-Ray and MIMIC-CXR, show the effectiveness of our approach, where the state-of-the-art results are achieved. We further conduct human evaluation and case study which confirm the validity of the reinforced algorithm in our approach.(1)
引用
收藏
页码:448 / 458
页数:11
相关论文
共 50 条
  • [1] Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation
    Peng, Peixi
    Fan, Wanshu
    Shen, Yue
    Liu, Wenfei
    Yang, Xin
    Zhang, Qiang
    Wei, Xiaopeng
    Zhou, Dongsheng
    [J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28 (12) : 7406 - 7419
  • [2] Visual prior-based cross-modal alignment network for radiology report generation
    Zhang, Sheng
    Zhou, Chuan
    Chen, Leiting
    Li, Zhiheng
    Gao, Yuan
    Chen, Yongqi
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 166
  • [3] Memory-Based Cross-Modal Semantic Alignment Network for Radiology Report Generation
    Tao, Yitian
    Ma, Liyan
    Yu, Jing
    Zhang, Han
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (07) : 4145 - 4156
  • [4] Cross-Modal Prototype Driven Network for Radiology Report Generation
    Wang, Jun
    Bhalerao, Abhir
    He, Yulan
    [J]. COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 563 - 579
  • [5] Visual-Textual Cross-Modal Interaction Network for Radiology Report Generation
    Zhang, Wenfeng
    Cai, Baoning
    Hu, Jianming
    Qin, Qibing
    Xie, Kezhen
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 984 - 988
  • [6] Cross-Modal Generation and Pair Correlation Alignment Hashing
    Ou, Weihua
    Deng, Jiaxin
    Zhang, Lei
    Gou, Jianping
    Zhou, Quan
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (03) : 3018 - 3026
  • [7] IFNet: An Image-Enhanced Cross-Modal Fusion Network for Radiology Report Generation
    Guo, Yi
    Hou, Xiaodi
    Liu, Zhi
    Zhang, Yijia
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 286 - 297
  • [8] Chest radiology report generation based on cross-modal multi-scale feature fusion
    Pan, Yu
    Liu, Li -Jun
    Yang, Xiao-Bing
    Peng, Wei
    Huang, Qing-Song
    [J]. JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES, 2024, 17 (01)
  • [9] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation
    Li, Mingjie
    Cai, Wenjia
    Verspoor, Karin
    Pan, Shirui
    Liang, Xiaodan
    Chang, Xiaojun
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20624 - 20633
  • [10] Token Embeddings Alignment for Cross-Modal Retrieval
    Xie, Chen-Wei
    Wu, Jianmin
    Zheng, Yun
    Pan, Pan
    Hua, Xian-Sheng
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4555 - 4563