Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus

被引:0
|
作者
Frigui, Hichem [1 ]
Candill, Joshua [1 ]
机构
[1] Univ Louisville, Dept Comp Engn & Comp Sci, Louisville, KY 40292 USA
关键词
Multimedia mining; multi-modal thesaurus; clustering; feature weighting; image annotation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an unsupervised approach to learn associations between continuous-valued attributes from different modalities. These associations are used to construct a multi-modal thesaurus that could serve as a foundation for inter-modality translation, and for hybrid navigation and search algorithms. We focus on extracting associations between visual features and textual keywords. Visual features consist of low-level attributes extracted from image content such as color, texture, and shape. Textual features consist of keywords that provide a description of the images. We assume that a collection of training images is available and that each image is globally annotated by few keywords. The objective is to extract representative visual profiles that correspond to frequent homogeneous regions, and to associate them with keywords. These profiles would be used to build the a multimodal thesaurus. The proposed approach was trained with a large collection of images, and the constructed thesaurus was used to label new images. Initial experiments indicate that we can achieve up to 71.9% relative improvement on captioning accuracy over the state-of-the-art.
引用
收藏
页码:479 / 484
页数:6
相关论文
共 50 条
  • [1] Learning consumer preferences through textual and visual data: a multi-modal approach
    Liu, Xinyu
    Liu, Yezheng
    Qian, Yang
    Jiang, Yuanchun
    Ling, Haifeng
    ELECTRONIC COMMERCE RESEARCH, 2023,
  • [2] Multi-modal visual tracking based on textual generation
    Wang, Jiahao
    Liu, Fang
    Jiao, Licheng
    Wang, Hao
    Li, Shuo
    Li, Lingling
    Chen, Puhua
    Liu, Xu
    INFORMATION FUSION, 2024, 112
  • [3] Multi-modal recommendation algorithm fusing visual and textual features
    Hu, Xuefeng
    Yu, Wenting
    Wu, Yun
    Chen, Yukang
    PLOS ONE, 2023, 18 (06):
  • [4] Building a multi-modal thesaurus from annotated images
    Frigui, Hichem
    Caudill, Joshua
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 198 - +
  • [5] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
    Lv, Cai-Chao
    Zhang, Xuan
    Zhang, Hong-Bo
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, : 9505 - 9513
  • [6] Multi-modal Retrieval via Deep Textual-Visual Correlation Learning
    Song, Jun
    Wang, Yueyang
    Wu, Fei
    Lu, Weiming
    Tang, Siliang
    Zhuang, Yueting
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: IMAGE AND VIDEO DATA ENGINEERING, ISCIDE 2015, PT I, 2015, 9242 : 176 - 185
  • [7] Visual mining of multi-modal social networks at different abstraction levels
    Singh, Lisa
    Beard, Mitchell
    Getoor, Lise
    11TH INTERNATIONAL CONFERENCE INFORMATION VISUALIZATION, 2007, : 672 - +
  • [8] A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
    Li, Yunxin
    Hu, Baotian
    Chen, Xinyu
    Ding, Yuxin
    Ma, Lin
    Zhang, Min
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10757 - 10770
  • [9] A multi-modal heterogeneous data mining algorithm using federated learning
    Wei, Xianyong
    JOURNAL OF ENGINEERING-JOE, 2021, 2021 (08): : 458 - 466
  • [10] A multi-modal heterogeneous data mining algorithm using federated learning
    Wei, Xianyong
    Journal of Engineering, 2021, 2021 (08): : 458 - 466