Cover-based multiple book genre recognition using an improved multimodal network

被引：3

作者：

Rasheed, Assad ^{[1
]}

Umar, Arif Iqbal ^{[1
]}

Shirazi, Syed Hamad ^{[1
]}

Khan, Zakir ^{[1
]}

Shahzad, Muhammad ^{[1
]}

机构：

[1] Hazara Univ, Dept Informat Technol, Mansehra, Pakistan

来源：

INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION | 2023年 / 26卷 / 01期

关键词：

Book covers classification; CNN; Image classifiers; Multimodal learning; Text classifiers; CLASSIFICATION;

D O I：

10.1007/s10032-022-00413-8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the idiom not to prejudge something by its outward appearance, we consider deep learning to learn whether we can judge a book by its cover or, more precisely, by its text and design. The classification was accomplished using three strategies, i.e., text only, image only, and both text and image. State-of-the-art CNNs (convolutional neural networks) models were used to classify books through cover images. The Gram and SE layers (squeeze and excitation) were used as an attention unit in them to learn the optimal features and identify characteristics from the cover image. The Gram layer enabled more accurate multi-genre classification than the SE layer. The text-based classification was done using word-based, character-based, and feature engineering-based models. We designed EXplicit interActive Network (EXAN) composed of context-relevant layers and multi-level attention layers to learn features from books title. We designed an improved multimodal fusion architecture for multimodal classification that uses an attention mechanism between modalities. The disparity in modalities convergence speed is addressed by pre-training each sub-network independently prior to end-to-end training of the model. Two book cover datasets were used in this study. Results demonstrated that text-based classifiers are superior to image-based classifiers. The proposed multimodal network outperformed all models for this task with the highest accuracy of 69.09% and 38.12% for Latin and Arabic book cover datasets. Similarly, the proposed EXAN surpassed the extant text classification models by scoring the highest prediction rates of 65.20% and 33.8% for Latin and Arabic book cover datasets.

引用

页码：65 / 88

页数：24

共 50 条

[41] Multimodal Gesture Recognition Based on the ResC3D Network
Miao, Qiguang
Li, Yunan
Ouyang, Wanli
Ma, Zhenxin
Xu, Xin
Shi, Weikang
Cao, Xiaochun
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3047 - 3055
[42] A Deep Neural Network based Multimodal Video Recognition System for Caring
Yan, Chao
Xu, Jiahua
Klopfer, Bastian
Nuernberger, Andreas
PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL CONFERENCE ON HUMAN-MACHINE SYSTEMS (ICHMS), 2020, : 472 - 476
[43] Multimodal emotion recognition based on manifold learning and convolution neural network
Yong Zhang
Cheng Cheng
YiDie Zhang
Multimedia Tools and Applications, 2022, 81 : 33253 - 33268
[44] Hierarchical heterogeneous graph network based multimodal emotion recognition in conversation
Peng, Junyin
Tang, Hong
Zheng, Wenbin
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[45] Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network
Dong Liu
Longxi Chen
Zhiyong Wang
Guangqiang Diao
Journal of Grid Computing, 2021, 19
[46] Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network
Liu, Dong
Chen, Longxi
Wang, Zhiyong
Diao, Guangqiang
JOURNAL OF GRID COMPUTING, 2021, 19 (02)
[47] Parallel Personal Protective Equipment Recognition Network Based On Multimodal Fusion
Yuan, Ning
Li, Ning
Qian, Haiyang
Zhang, Zhengran
Zhou, Huiyu
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 1677 - 1683
[48] Vehicle Type Recognition Based on Improved Capsule Network
Jia, Lan
Miao, Hongxia
Qi, Bensheng
Wang, Jianpeng
2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2019,
[49] Recognition of Chatter Type Based on Improved Neural Network
Xie Xiaozheng
Xie Yongpeng
Zhao Rongzhen
Jin Wuyin
Yao Yunping
INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2012), 2013, 8768
[50] Face recognition based on improved residual neural network
Chen Zhenzhou
Ding Pengcheng
PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 4626 - 4629

← 1 2 3 4 5 →