Mining Visual and Textual Data for Constructing a Multi-Modal Thesaurus

被引：0

作者：

Frigui, Hichem ^{[1
]}

Candill, Joshua ^{[1
]}

机构：

[1] Univ Louisville, Dept Comp Engn & Comp Sci, Louisville, KY 40292 USA

来源：

PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING | 2007年

关键词：

Multimedia mining; multi-modal thesaurus; clustering; feature weighting; image annotation;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an unsupervised approach to learn associations between continuous-valued attributes from different modalities. These associations are used to construct a multi-modal thesaurus that could serve as a foundation for inter-modality translation, and for hybrid navigation and search algorithms. We focus on extracting associations between visual features and textual keywords. Visual features consist of low-level attributes extracted from image content such as color, texture, and shape. Textual features consist of keywords that provide a description of the images. We assume that a collection of training images is available and that each image is globally annotated by few keywords. The objective is to extract representative visual profiles that correspond to frequent homogeneous regions, and to associate them with keywords. These profiles would be used to build the a multimodal thesaurus. The proposed approach was trained with a large collection of images, and the constructed thesaurus was used to label new images. Initial experiments indicate that we can achieve up to 71.9% relative improvement on captioning accuracy over the state-of-the-art.

引用

页码：479 / 484

页数：6

共 50 条

[1] Learning consumer preferences through textual and visual data: a multi-modal approach
Liu, Xinyu
Liu, Yezheng
Qian, Yang
Jiang, Yuanchun
Ling, Haifeng
ELECTRONIC COMMERCE RESEARCH, 2023,
[2] Multi-modal visual tracking based on textual generation
Wang, Jiahao
Liu, Fang
Jiao, Licheng
Wang, Hao
Li, Shuo
Li, Lingling
Chen, Puhua
Liu, Xu
INFORMATION FUSION, 2024, 112
[3] Multi-modal recommendation algorithm fusing visual and textual features
Hu, Xuefeng
Yu, Wenting
Wu, Yun
Chen, Yukang
PLOS ONE, 2023, 18 (06):
[4] Building a multi-modal thesaurus from annotated images
Frigui, Hichem
Caudill, Joshua
18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS, 2006, : 198 - +
[5] Visual audio and textual triplet fusion network for multi-modal sentiment analysis
Lv, Cai-Chao
Zhang, Xuan
Zhang, Hong-Bo
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, : 9505 - 9513
[6] Multi-modal Retrieval via Deep Textual-Visual Correlation Learning
Song, Jun
Wang, Yueyang
Wu, Fei
Lu, Weiming
Tang, Siliang
Zhuang, Yueting
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: IMAGE AND VIDEO DATA ENGINEERING, ISCIDE 2015, PT I, 2015, 9242 : 176 - 185
[7] Visual mining of multi-modal social networks at different abstraction levels
Singh, Lisa
Beard, Mitchell
Getoor, Lise
11TH INTERNATIONAL CONFERENCE INFORMATION VISUALIZATION, 2007, : 672 - +
[8] A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues
Li, Yunxin
Hu, Baotian
Chen, Xinyu
Ding, Yuxin
Ma, Lin
Zhang, Min
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10757 - 10770
[9] A multi-modal heterogeneous data mining algorithm using federated learning
Wei, Xianyong
JOURNAL OF ENGINEERING-JOE, 2021, 2021 (08): : 458 - 466
[10] A multi-modal heterogeneous data mining algorithm using federated learning
Wei, Xianyong
Journal of Engineering, 2021, 2021 (08): : 458 - 466

← 1 2 3 4 5 →