Classifying Images of Intangible Cultural Heritages with Multimodal Fusion

被引：0

作者：

Tao F. ^{[1
]}

Hao W. ^{[1
]}

Yueyan L. ^{[1
]}

Sanhong D. ^{[1
]}

机构：

[1] School of Information Management, Nanjing University, Nanjing

来源：

Data Analysis and Knowledge Discovery | 2022年 / 6卷 / 2-3期

基金：

中国国家自然科学基金;

关键词：

Digital Humanities; Image Classification; Multimodal Classification;

D O I：

10.11925/infotech.2096-3467.2021.0911

中图分类号：

学科分类号：

摘要：

[Objective] This paper proposes a new method combining images and texual descriptions, aiming to improve the classification of Intangible Cultural Heritage (ICH) images. [Methods] We built the new model with multimodal fusion, which includes a fine-tuned deep pre-trained model for extracting visual semantic features, a BERT model for extracting textual features, a fusion layer for concatenating visual and textual features, and an output layer for predicting labels. [Results] We examined the proposed model with the national ICH project-New Year Prints to classify the Mianzu Prints, Taohuawu Prints, Yangjiabu Prints, and Yangliuqing Prints. We found that fine-tuning the convolutional layer strengthened the visual semantics features of the ICH images, and the F1 value for classification reached 72.028%. Compared with the baseline models, our method yielded the best results, with a F1 value of 77.574%. [Limitations] The proposed model was only tested on New Year Prints, which needs to be expanded to more ICH projects in the future. [Conclusions] Adding textual description features can improve the performance of ICH image classification. Fine-tuning convolutional layers in image deep pre-trained model can improve extraction of visual semantics features. © 2022, Chinese Academy of Sciences. All rights reserved.

引用

页码：329 / 337

页数：8

共 28 条

[1] Zhaolun Xiang, Protecting Intangible Cultural Heritage Involving People, Things and Lives, People Daily, 12
[2] Notice on Issuing the“Fourteenth Five-Year Plan for the Protection of Intangible Cultural Heritage”
[3] Do T N, Pham N K, Nguyen H H, Et al., Stacking of SVMs for Classifying Intangible Cultural Heritage Images, Proceedings of the 6th International Conference on Computer Science, Applied Mathematics and Applications, pp. 186-196, (2019)
[4] Jankovic R., Machine Learning Models for Cultural Heritage Image Classification: Comparison Based on Attribute Selection, Information, 11, 1, (2019)
[5] Li Q C, Gkoumas D, Lioma C, Et al., Quantum-Inspired Multimodal Fusion for Video Sentiment Analysis, Information Fusion, 65, pp. 58-71, (2021)
[6] Abdu S A, Yousef A H, Salem A., Multimodal Video Sentiment Analysis Using Deep Learning Approaches, a Survey, Information Fusion, 76, pp. 204-226, (2021)
[7] Ananthram A, Saravanakumar K K, Huynh J, Et al., Multi-Modal Emotion Detection with Transfer Learning[OL]
[8] Xu J, Li Z J, Huang F R, Et al., Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations, IEEE Transactions on Industrial Informatics, 17, 4, pp. 2974-2982, (2021)
[9] Huang F R, Zhang X M, Zhao Z H, Et al., Image-Text Sentiment Analysis via Deep Multimodal Attentive Fusion, KnowledgeBased Systems, 167, pp. 26-37, (2019)
[10] Campos V, Jou B, Giro-i-Nieto X., From Pixels to Sentiment: Fine-Tuning CNNS for Visual Sentiment Prediction, Image and Vision Computing, 65, pp. 15-22, (2017)

← 1 2 3 →