NLP-Based Fusion Approach to Robust Image Captioning

被引:0
|
作者
Ricci, Riccardo [1 ]
Melgani, Farid [1 ]
Marcato Junior, Jose [2 ]
Goncalves, Wesley Nunes
机构
[1] Univ Trento, Dept Informat Engn & Comp Sci, I-38123 Trento, Italy
[2] Univ Fed Mato Grosso do Sul, BR-79070900 Campo Grande, MS, Brazil
关键词
Remote sensing; Visualization; Transformers; Training; Task analysis; Robustness; Vocabulary; Bidirectional encoder representations from transformers (BERT); contrastive language-image pretraining (CLIP); ensemble fusion; generative pretrained transformer (GPT); image captioning; natural language processing (NLP);
D O I
10.1109/JSTARS.2024.3413323
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Robustness in remote sensing image captioning is crucial for real-world applications. However, most of the research focuses on improving the performance of single captioning algorithms, either by introducing novel feature processing units or metatasks that indirectly improve the captioning performance. Despite indisputable improvements in performance, we argue that relying on the output of a single model can be critical, especially when data scarcity limits the generalization capability of the trained algorithms. Focusing on the advantages of ensembles for improving robustness, we propose different ways to select or generate a single most coherent caption from a set of predictions made by different captioning algorithms. The disjunction between the two phases of prediction and selection/generation provides high flexibility for inserting different captioning algorithms, each with its peculiarities and strengths. In this context, based on neural natural language processing tools, our approach can be considered as an additional fusion block that enables higher robustness with a contained complexity burden.
引用
收藏
页码:11809 / 11822
页数:14
相关论文
共 50 条
  • [1] Medical prescription classification: a NLP-based approach
    Carchiolo, Vincenza
    Longheu, Alessandro
    Reitano, Giuseppa
    Zagarella, Luca
    [J]. PROCEEDINGS OF THE 2019 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2019, : 605 - 609
  • [2] NLP-Based Recommendation Approach for Diverse Service Generation
    Jeong, Baek
    Lee, Kyoung Jun
    [J]. IEEE ACCESS, 2024, 12 : 14260 - 14274
  • [3] A multimodal fusion approach for image captioning
    Zhao, Dexin
    Chang, Zhi
    Guo, Shutao
    [J]. NEUROCOMPUTING, 2019, 329 : 476 - 485
  • [4] AN NLP-BASED APPROACH FOR IMPROVING HUMAN-ROBOT INTERACTION
    Kilicaslan, Yilmaz
    Tuna, Gurkan
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING RESEARCH, 2013, 3 (03) : 189 - 200
  • [5] NEW APPROACH TO NLP-BASED TRAJECTORY OPTIMIZATION OF SPACE APPLICATIONS
    Erb, Sven O.
    Wiegand, Andreas
    Weikert, Sven
    [J]. ASTRODYNAMICS 2009, VOL 135, PTS 1-3, 2010, 135 : 561 - 576
  • [6] Semantic Search and NLP-Based Diagnostics
    Kats, Yefim
    [J]. 2014 IEEE 27TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS (CBMS), 2014, : 277 - 280
  • [7] An NLP-Based Approach for Detecting Ambiguity of Thai Software Requirements Specification
    Intana, Adisak
    Laosen, Kanjana
    Nuanchan, Panya
    Pattanakit, Nattapong
    Dermchai, Sathani
    [J]. 2024 21ST INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING, JCSSE 2024, 2024, : 99 - 106
  • [8] Practical NLP-based text indexing
    Vilares, J
    Barcala, FM
    Alonso, MA
    Graña, J
    Vilares, M
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2002, PROCEEDINGS, 2002, 2527 : 635 - 644
  • [9] Leveraging Social Media as a Source of Mobility Intelligence: An NLP-Based Approach
    Fontes, Tania
    Murcos, Francisco
    Carneiro, Eduardo
    Ribeiro, Joel
    Rossetti, Rosaldo J. F.
    [J]. IEEE OPEN JOURNAL OF INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 4 : 663 - 681
  • [10] An NLP-based cross-document approach to narrative structure discovery
    Reiter, Nils
    Frank, Anette
    Hellwig, Oliver
    [J]. LITERARY AND LINGUISTIC COMPUTING, 2014, 29 (04): : 583 - 605