DeepUnseen: Unpredicted Event Recognition Through Integrated Vision-Language Models

被引:0
|
作者
Sakaino, Hidetomo [1 ]
Gaviphat, Natnapat [1 ]
Zamora, Louie [1 ]
Insisiengmay, Alivanh [1 ]
Ningrum, Dwi Fetiria [1 ]
机构
[1] Weathernews Inc, Transportat Weather Lab, AI Image Lab, Chiba, Japan
关键词
disaster; vision-language; DeepUnseen; combination; image captioning;
D O I
10.1109/CAI54212.2023.00029
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep Learning-based segmentation models provide many benefits for scene understanding. However, such models have not been used and tested for unpredicted events like natural disasters by hurricanes, tornados, and typhoons. Since low illumination, heavy rainfall, and storms can degrade image quality, implementing a single state-of-the-art (SOTA) model only may fail to recognize objects correctly. Also, there are more enhancements to segmentation that remain unsolved. Thus, this paper proposes a vision-language-based DL model, namely, DeepUnseen, by integrating different Deep Learning models with the benefits of class and segmentation. Experimental results using disaster and traffic accident scenes showed superiority over a single SOTA Deep Learning model. Moreover, better semantically refined classes are obtained.
引用
收藏
页码:48 / 50
页数:3
相关论文
共 50 条
  • [1] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wenhao Wu
    Zhun Sun
    Yuxin Song
    Jingdong Wang
    Wanli Ouyang
    [J]. International Journal of Computer Vision, 2024, 132 (2) : 392 - 409
  • [2] Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective
    Wu, Wenhao
    Sun, Zhun
    Song, Yuxin
    Wang, Jingdong
    Ouyang, Wanli
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (02) : 392 - 409
  • [3] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [4] ILLUME: Rationalizing Vision-Language Models through Human Interactions
    Brack, Manuel
    Schramowski, Patrick
    Deiseroth, Bjorn
    Kersting, Kristian
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [5] Vision-Language Fusion for Object Recognition
    Shiang, Sz-Rung
    Rosenthal, Stephanie
    Gershman, Anatole
    Carbonell, Jaime
    Oh, Jean
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4603 - 4610
  • [6] Mapping Natural Language Intents to User Interfaces through Vision-Language Models
    Abukadah, Halima
    Fereidouni, Moghis
    Siddique, A. B.
    [J]. 18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 237 - 244
  • [7] Learning to Prompt for Vision-Language Models
    Kaiyang Zhou
    Jingkang Yang
    Chen Change Loy
    Ziwei Liu
    [J]. International Journal of Computer Vision, 2022, 130 : 2337 - 2348
  • [8] Learning to Prompt for Vision-Language Models
    Zhou, Kaiyang
    Yang, Jingkang
    Loy, Chen Change
    Liu, Ziwei
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2022, 130 (09) : 2337 - 2348
  • [9] VISION-LANGUAGE MODELS AS SUCCESS DETECTORS
    Du, Yuqing
    Konyushkova, Ksenia
    Denil, Misha
    Raju, Akhil
    Landon, Jessica
    Hill, Felix
    de Freitas, Nando
    Cabi, Serkan
    [J]. CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 120 - 136
  • [10] Learning to Prompt for Vision-Language Emotion Recognition
    Xie, Hongxia
    Chung, Hua
    Shuai, Hong-Han
    Cheng, Wen-Huang
    [J]. 2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,