Cover-based multiple book genre recognition using an improved multimodal network

被引:3
|
作者
Rasheed, Assad [1 ]
Umar, Arif Iqbal [1 ]
Shirazi, Syed Hamad [1 ]
Khan, Zakir [1 ]
Shahzad, Muhammad [1 ]
机构
[1] Hazara Univ, Dept Informat Technol, Mansehra, Pakistan
关键词
Book covers classification; CNN; Image classifiers; Multimodal learning; Text classifiers; CLASSIFICATION;
D O I
10.1007/s10032-022-00413-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the idiom not to prejudge something by its outward appearance, we consider deep learning to learn whether we can judge a book by its cover or, more precisely, by its text and design. The classification was accomplished using three strategies, i.e., text only, image only, and both text and image. State-of-the-art CNNs (convolutional neural networks) models were used to classify books through cover images. The Gram and SE layers (squeeze and excitation) were used as an attention unit in them to learn the optimal features and identify characteristics from the cover image. The Gram layer enabled more accurate multi-genre classification than the SE layer. The text-based classification was done using word-based, character-based, and feature engineering-based models. We designed EXplicit interActive Network (EXAN) composed of context-relevant layers and multi-level attention layers to learn features from books title. We designed an improved multimodal fusion architecture for multimodal classification that uses an attention mechanism between modalities. The disparity in modalities convergence speed is addressed by pre-training each sub-network independently prior to end-to-end training of the model. Two book cover datasets were used in this study. Results demonstrated that text-based classifiers are superior to image-based classifiers. The proposed multimodal network outperformed all models for this task with the highest accuracy of 69.09% and 38.12% for Latin and Arabic book cover datasets. Similarly, the proposed EXAN surpassed the extant text classification models by scoring the highest prediction rates of 65.20% and 33.8% for Latin and Arabic book cover datasets.
引用
收藏
页码:65 / 88
页数:24
相关论文
共 50 条
  • [31] Scene recognition using multiple representation network
    Lin, Chaowei
    Lee, Feifei
    Xie, Lin
    Cai, Jiawei
    Chen, Hanqing
    Liu, Li
    Chen, Qiu
    APPLIED SOFT COMPUTING, 2022, 118
  • [32] Multimodal Emotion Recognition Using a Hierarchical Fusion Convolutional Neural Network
    Zhang, Yong
    Cheng, Cheng
    Zhang, Yidie
    IEEE ACCESS, 2021, 9 : 7943 - 7951
  • [33] Multimodal function optimizations with multiple maximums and multiple minimums using an improved PSO algorithm
    Chang, Wei-Der
    APPLIED SOFT COMPUTING, 2017, 60 : 60 - 72
  • [34] An AI-based Approach for Improved Sign Language Recognition using Multiple Videos
    Dignan, Cameron
    Perez, Eliud
    Ahmad, Ishfaq
    Huber, Manfred
    Clark, Addison
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34525 - 34546
  • [35] An AI-based Approach for Improved Sign Language Recognition using Multiple Videos
    Cameron Dignan
    Eliud Perez
    Ishfaq Ahmad
    Manfred Huber
    Addison Clark
    Multimedia Tools and Applications, 2022, 81 : 34525 - 34546
  • [36] Recognition of Students' Multiple Mental States in Conversation Based on Multimodal Cues
    Peng, Shimeng
    Ohira, Shigeki
    Nagao, Katashi
    COMPUTER SUPPORTED EDUCATION (CSEDU 2020), 2021, 1473 : 468 - 479
  • [37] Multimodal Continuous Affect Recognition Based on LSTM and Multiple Kernel Learning
    Wei, Jiamei
    Pei, Ercheng
    Jiang, Dongmei
    Sahli, Hichem
    Xie, Lei
    Fu, Zhonghua
    2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [38] APPLE FRUIT RECOGNITION BASED ON A DEEP LEARNING ALGORITHM USING AN IMPROVED LIGHTWEIGHT NETWORK
    Ji, J.
    Zhu, X.
    Ma, H.
    Wang, H.
    Jin, X.
    Zhao, K.
    APPLIED ENGINEERING IN AGRICULTURE, 2021, 37 (01) : 123 - 134
  • [39] Multimodal emotion recognition based on manifold learning and convolution neural network
    Zhang, Yong
    Cheng, Cheng
    Zhang, YiDie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (23) : 33253 - 33268
  • [40] Action Recognition Based on CSI Signal Using Improved Deep Residual Network Model
    Zhao, Jian
    Chong, Shangwu
    Huang, Liang
    Li, Xin
    He, Chen
    Jia, Jian
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2022, 130 (03): : 1827 - 1851