Visual-Textual Cross-Modal Interaction Network for Radiology Report Generation

被引:0
|
作者
Zhang, Wenfeng [1 ]
Cai, Baoning [1 ]
Hu, Jianming [2 ]
Qin, Qibing [3 ,4 ]
Xie, Kezhen [5 ]
机构
[1] Chongqing Normal Univ, Coll Comp & Informat Sci, Chongqing 401331, Peoples R China
[2] Chongqing Normal Univ, Sch Phys & Elect Engn, Chongqing 401331, Peoples R China
[3] Weifang Univ, Sch Comp Engn, Weifang 261000, Peoples R China
[4] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266000, Peoples R China
[5] Ocean Univ China, Fac Informat Sci & Engn, Qingdao 266000, Peoples R China
关键词
Abundant clinical information; cross-modal interaction; radiology report generation;
D O I
10.1109/LSP.2024.3379005
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The radiology report generation task generates diagnostic descriptions from radiology images, aiming to alleviate the onerous task for radiologists and alerting them to abnormalities. However, the data bias problem poses a persistent challenge, since the abnormal regions usually occupy a small portion of radiology image, while the report generation process should pay greater attention to the abnormal regions. Moreover, the data volume is relatively small compared to large language models, posing challenges during training. To address these issues effectively, we propose a Visual-textual Cross-model Interaction Network (VCIN) to enhance the quality of generated reports. VCIN comprises two key modules: Abundant Clinical Information Embedding (ACIE), which gathers rich cross-modal interaction information to promote the report generation of abnormal regions; and a Bert-based Decoder-only Generator (BDG), built on Bert architecture to mitigate training difficulties. The superior performance of our proposed model is demonstrated through experimental results obtained from two public benchmark datasets.
引用
收藏
页码:984 / 988
页数:5
相关论文
共 50 条
  • [1] Nonlinear Discrete Cross-Modal Hashing for Visual-Textual Data
    Ma, Dekui
    Liang, Jian
    He, Ran
    Kong, Xiangwei
    [J]. IEEE MULTIMEDIA, 2017, 24 (02) : 56 - 65
  • [2] Visual prior-based cross-modal alignment network for radiology report generation
    Zhang, Sheng
    Zhou, Chuan
    Chen, Leiting
    Li, Zhiheng
    Gao, Yuan
    Chen, Yongqi
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 166
  • [3] Cross-Modal Prototype Driven Network for Radiology Report Generation
    Wang, Jun
    Bhalerao, Abhir
    He, Yulan
    [J]. COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 563 - 579
  • [4] Reinforced Cross-modal Alignment for Radiology Report Generation
    Qin, Han
    Song, Yan
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 448 - 458
  • [5] Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation
    Peng, Peixi
    Fan, Wanshu
    Shen, Yue
    Liu, Wenfei
    Yang, Xin
    Zhang, Qiang
    Wei, Xiaopeng
    Zhou, Dongsheng
    [J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28 (12) : 7406 - 7419
  • [6] IFNet: An Image-Enhanced Cross-Modal Fusion Network for Radiology Report Generation
    Guo, Yi
    Hou, Xiaodi
    Liu, Zhi
    Zhang, Yijia
    [J]. BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 286 - 297
  • [7] Memory-Based Cross-Modal Semantic Alignment Network for Radiology Report Generation
    Tao, Yitian
    Ma, Liyan
    Yu, Jing
    Zhang, Han
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (07) : 4145 - 4156
  • [8] Cross Modal Person Re-identification with Visual-Textual Queries
    Farooq, Ammarah
    Awais, Muhammad
    Kittler, Josef
    Akbari, Ali
    Khalid, Syed Safwan
    [J]. IEEE/IAPR INTERNATIONAL JOINT CONFERENCE ON BIOMETRICS (IJCB 2020), 2020,
  • [9] Visual-Textual Attentive Semantic Consistency for Medical Report Generation
    Zhou, Yi
    Huang, Lei
    Zhou, Tao
    Fu, Huazhu
    Shao, Ling
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3965 - 3974
  • [10] Deep Coordinated Textual and Visual Network for Sentiment-Oriented Cross-Modal Retrieval
    Fu, Jiamei
    She, Dongyu
    Yao, Xingxu
    Zhang, Yuxiang
    Yang, Jufeng
    [J]. PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 684 - 696