Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus

被引:0
|
作者
Frigui, Hichem [1 ]
Candill, Joshua [1 ]
机构
[1] Univ Louisville, Dept Comp Engn & Comp Sci, Louisville, KY 40292 USA
关键词
Multimedia mining; multi-modal thesaurus; clustering; feature weighting; image annotation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an unsupervised approach to learn associations between continuous-valued attributes from different modalities. These associations are used to construct a multi-modal thesaurus that could serve as a foundation for inter-modality translation, and for hybrid navigation and search algorithms. We focus on extracting associations between visual features and textual keywords. Visual features consist of low-level attributes extracted from image content such as color, texture, and shape. Textual features consist of keywords that provide a description of the images. We assume that a collection of training images is available and that each image is globally annotated by few keywords. The objective is to extract representative visual profiles that correspond to frequent homogeneous regions, and to associate them with keywords. These profiles would be used to build the a multimodal thesaurus. The proposed approach was trained with a large collection of images, and the constructed thesaurus was used to label new images. Initial experiments indicate that we can achieve up to 71.9% relative improvement on captioning accuracy over the state-of-the-art.
引用
收藏
页码:479 / 484
页数:6
相关论文
共 50 条
  • [31] An Abnormal Behavior Detection Method Leveraging Multi-modal Data Fusion and Deep Mining
    Tian, Xinyu
    Zheng, Qinghe
    Jiang, Nan
    IAENG International Journal of Applied Mathematics, 2021, 51 (01)
  • [32] A Multi-modal Data Mining Algorithm for Corner Case of Automatic Driving Road Scene
    Wang, Hai
    Zhang, Guirong
    Luo, Tong
    Qiu, Meng
    Cai, Yingfeng
    Chen, Long
    Qiche Gongcheng/Automotive Engineering, 2024, 46 (07): : 1239 - 1248
  • [33] Dual Path Multi-Modal High-Order Features for Textual Content based Visual Question Answering
    Li, Yanan
    Lin, Yuetan
    Zhao, Honghui
    Wang, Donghui
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4324 - 4331
  • [34] Dynamic Multi-modal Prompting for Efficient Visual Grounding
    Wu, Wansen
    Liu, Ting
    Wang, Youkai
    Xu, Kai
    Yin, Quanjun
    Hu, Yue
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VII, 2024, 14431 : 359 - 371
  • [35] Visual Entity Linking via Multi-modal Learning
    Zheng, Qiushuo
    Wen, Hao
    Wang, Meng
    Qi, Guilin
    DATA INTELLIGENCE, 2022, 4 (01) : 1 - 19
  • [36] Interactive Multi-Modal Display Spaces for Visual Analysis
    Marrinan, Thomas
    Rizzi, Silvio
    Nishimoto, Arthur
    Johnson, Andrew
    Insley, Joseph A.
    Papka, Michael E.
    PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE SURFACES AND SPACES, (ISS 2016), 2016, : 421 - 426
  • [37] A Study of Multi-modal Display System with Visual Feedback
    Tanikawa, Tomohiro
    Hirose, Michitaka
    PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON UNIVERSAL COMMUNICATION, 2008, : 285 - 292
  • [38] Multi-modal visual tracking: Review and experimental comparison
    Pengyu Zhang
    Dong Wang
    Huchuan Lu
    Computational Visual Media, 2024, 10 : 193 - 214
  • [39] Multi-Modal Dynamic Graph Transformer for Visual Grounding
    Chen, Sijia
    Li, Baochun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15513 - 15522
  • [40] Multi-modal visual tracking: Review and experimental comparison
    Zhang, Pengyu
    Wang, Dong
    Lu, Huchuan
    COMPUTATIONAL VISUAL MEDIA, 2024, 10 (02) : 193 - 214