Cover-based multiple book genre recognition using an improved multimodal network

被引:3
|
作者
Rasheed, Assad [1 ]
Umar, Arif Iqbal [1 ]
Shirazi, Syed Hamad [1 ]
Khan, Zakir [1 ]
Shahzad, Muhammad [1 ]
机构
[1] Hazara Univ, Dept Informat Technol, Mansehra, Pakistan
关键词
Book covers classification; CNN; Image classifiers; Multimodal learning; Text classifiers; CLASSIFICATION;
D O I
10.1007/s10032-022-00413-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the idiom not to prejudge something by its outward appearance, we consider deep learning to learn whether we can judge a book by its cover or, more precisely, by its text and design. The classification was accomplished using three strategies, i.e., text only, image only, and both text and image. State-of-the-art CNNs (convolutional neural networks) models were used to classify books through cover images. The Gram and SE layers (squeeze and excitation) were used as an attention unit in them to learn the optimal features and identify characteristics from the cover image. The Gram layer enabled more accurate multi-genre classification than the SE layer. The text-based classification was done using word-based, character-based, and feature engineering-based models. We designed EXplicit interActive Network (EXAN) composed of context-relevant layers and multi-level attention layers to learn features from books title. We designed an improved multimodal fusion architecture for multimodal classification that uses an attention mechanism between modalities. The disparity in modalities convergence speed is addressed by pre-training each sub-network independently prior to end-to-end training of the model. Two book cover datasets were used in this study. Results demonstrated that text-based classifiers are superior to image-based classifiers. The proposed multimodal network outperformed all models for this task with the highest accuracy of 69.09% and 38.12% for Latin and Arabic book cover datasets. Similarly, the proposed EXAN surpassed the extant text classification models by scoring the highest prediction rates of 65.20% and 33.8% for Latin and Arabic book cover datasets.
引用
收藏
页码:65 / 88
页数:24
相关论文
共 50 条
  • [41] Multimodal Gesture Recognition Based on the ResC3D Network
    Miao, Qiguang
    Li, Yunan
    Ouyang, Wanli
    Ma, Zhenxin
    Xu, Xin
    Shi, Weikang
    Cao, Xiaochun
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 3047 - 3055
  • [42] A Deep Neural Network based Multimodal Video Recognition System for Caring
    Yan, Chao
    Xu, Jiahua
    Klopfer, Bastian
    Nuernberger, Andreas
    PROCEEDINGS OF THE 2020 IEEE INTERNATIONAL CONFERENCE ON HUMAN-MACHINE SYSTEMS (ICHMS), 2020, : 472 - 476
  • [43] Multimodal emotion recognition based on manifold learning and convolution neural network
    Yong Zhang
    Cheng Cheng
    YiDie Zhang
    Multimedia Tools and Applications, 2022, 81 : 33253 - 33268
  • [44] Hierarchical heterogeneous graph network based multimodal emotion recognition in conversation
    Peng, Junyin
    Tang, Hong
    Zheng, Wenbin
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [45] Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network
    Dong Liu
    Longxi Chen
    Zhiyong Wang
    Guangqiang Diao
    Journal of Grid Computing, 2021, 19
  • [46] Speech Expression Multimodal Emotion Recognition Based on Deep Belief Network
    Liu, Dong
    Chen, Longxi
    Wang, Zhiyong
    Diao, Guangqiang
    JOURNAL OF GRID COMPUTING, 2021, 19 (02)
  • [47] Parallel Personal Protective Equipment Recognition Network Based On Multimodal Fusion
    Yuan, Ning
    Li, Ning
    Qian, Haiyang
    Zhang, Zhengran
    Zhou, Huiyu
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 1677 - 1683
  • [48] Vehicle Type Recognition Based on Improved Capsule Network
    Jia, Lan
    Miao, Hongxia
    Qi, Bensheng
    Wang, Jianpeng
    2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TW), 2019,
  • [49] Recognition of Chatter Type Based on Improved Neural Network
    Xie Xiaozheng
    Xie Yongpeng
    Zhao Rongzhen
    Jin Wuyin
    Yao Yunping
    INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2012), 2013, 8768
  • [50] Face recognition based on improved residual neural network
    Chen Zhenzhou
    Ding Pengcheng
    PROCEEDINGS OF THE 2019 31ST CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2019), 2019, : 4626 - 4629