Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus

被引:0
|
作者
Frigui, Hichem [1 ]
Candill, Joshua [1 ]
机构
[1] Univ Louisville, Dept Comp Engn & Comp Sci, Louisville, KY 40292 USA
关键词
Multimedia mining; multi-modal thesaurus; clustering; feature weighting; image annotation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an unsupervised approach to learn associations between continuous-valued attributes from different modalities. These associations are used to construct a multi-modal thesaurus that could serve as a foundation for inter-modality translation, and for hybrid navigation and search algorithms. We focus on extracting associations between visual features and textual keywords. Visual features consist of low-level attributes extracted from image content such as color, texture, and shape. Textual features consist of keywords that provide a description of the images. We assume that a collection of training images is available and that each image is globally annotated by few keywords. The objective is to extract representative visual profiles that correspond to frequent homogeneous regions, and to associate them with keywords. These profiles would be used to build the a multimodal thesaurus. The proposed approach was trained with a large collection of images, and the constructed thesaurus was used to label new images. Initial experiments indicate that we can achieve up to 71.9% relative improvement on captioning accuracy over the state-of-the-art.
引用
收藏
页码:479 / 484
页数:6
相关论文
共 50 条
  • [41] Visual Hallucinations of Multi-modal Large Language Models
    Huang, Wen
    Liu, Hongbin
    Guo, Minxin
    Gong, Neil Zhenqiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 9614 - 9631
  • [42] Multi-Modal Dynamic Graph Transformer for Visual Grounding
    Chen, Sijia
    Li, Baochun
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15513 - 15522
  • [43] Generation of Visual Representations for Multi-Modal Mathematical Knowledge
    Wu, Lianlong
    Choi, Seewon
    Raggi, Daniel
    Stockdill, Aaron
    Garcia, Grecia Garcia
    Colarusso, Fiorenzo
    Cheng, Peter C. H.
    Jamnik, Mateja
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23850 - 23852
  • [44] Intelligent analysis for medical multi-modal data
    Multimedia Tools and Applications, 2021, 80 : 17333 - 17333
  • [45] An Adaptive Multiplexer for Multi-Modal Data Communication
    Eid, Mohamad
    Cha, Jongeun
    El Saddik, Abdulmotaleb
    2009 IEEE INTERNATIONAL WORKSHOP ON HAPTIC AUDIO VISUAL ENVIRONMENT AND GAMES, 2009, : 111 - 116
  • [46] Scalable analysis of multi-modal biomedical data
    Smith, Jaclyn
    Shi, Yao
    Benedikt, Michael
    Nikolic, Milos
    GIGASCIENCE, 2021, 10 (09):
  • [47] MixGen: A New Multi-Modal Data Augmentation
    Hao, Xiaoshuai
    Zhu, Yi
    Appalaraju, Srikar
    Zhang, Aston
    Zhang, Wanqian
    Li, Bo
    Li, Mu
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW), 2023, : 379 - 389
  • [48] Spatial mapping of multi-modal data in neuroscience
    Hawrylycz, Mike
    Sunkin, Susan
    Ng, Lydia
    METHODS, 2015, 73 : 1 - 3
  • [49] Learning to Hash on Partial Multi-Modal Data
    Wang, Qifan
    Si, Luo
    Shen, Bin
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3904 - 3910
  • [50] Intelligent analysis for medical multi-modal data
    Ying, Shihui
    Wu, Yang
    Zhu, Xiaofeng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 17333 - 17333