Cover-based multiple book genre recognition using an improved multimodal network

被引:3
|
作者
Rasheed, Assad [1 ]
Umar, Arif Iqbal [1 ]
Shirazi, Syed Hamad [1 ]
Khan, Zakir [1 ]
Shahzad, Muhammad [1 ]
机构
[1] Hazara Univ, Dept Informat Technol, Mansehra, Pakistan
关键词
Book covers classification; CNN; Image classifiers; Multimodal learning; Text classifiers; CLASSIFICATION;
D O I
10.1007/s10032-022-00413-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the idiom not to prejudge something by its outward appearance, we consider deep learning to learn whether we can judge a book by its cover or, more precisely, by its text and design. The classification was accomplished using three strategies, i.e., text only, image only, and both text and image. State-of-the-art CNNs (convolutional neural networks) models were used to classify books through cover images. The Gram and SE layers (squeeze and excitation) were used as an attention unit in them to learn the optimal features and identify characteristics from the cover image. The Gram layer enabled more accurate multi-genre classification than the SE layer. The text-based classification was done using word-based, character-based, and feature engineering-based models. We designed EXplicit interActive Network (EXAN) composed of context-relevant layers and multi-level attention layers to learn features from books title. We designed an improved multimodal fusion architecture for multimodal classification that uses an attention mechanism between modalities. The disparity in modalities convergence speed is addressed by pre-training each sub-network independently prior to end-to-end training of the model. Two book cover datasets were used in this study. Results demonstrated that text-based classifiers are superior to image-based classifiers. The proposed multimodal network outperformed all models for this task with the highest accuracy of 69.09% and 38.12% for Latin and Arabic book cover datasets. Similarly, the proposed EXAN surpassed the extant text classification models by scoring the highest prediction rates of 65.20% and 33.8% for Latin and Arabic book cover datasets.
引用
收藏
页码:65 / 88
页数:24
相关论文
共 50 条
  • [1] Cover-based multiple book genre recognition using an improved multimodal network
    Assad Rasheed
    Arif Iqbal Umar
    Syed Hamad Shirazi
    Zakir Khan
    Muhammad Shahzad
    International Journal on Document Analysis and Recognition (IJDAR), 2023, 26 : 65 - 88
  • [2] Reading Book by the Cover-Book Genre Detection Using Short Descriptions
    Sobkowicz, Antoni
    Kozlowski, Marek
    Buczkowski, Przemyslaw
    MAN-MACHINE INTERACTIONS 5, ICMMI 2017, 2018, 659 : 439 - 448
  • [3] Cracking Classification Using Minimum Rectangular Cover-Based Support Vector Machine
    Wang, Shaofan
    Qiu, Shi
    Wang, Wenjuan
    Xiao, Danny
    Wang, Kelvin C. P.
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2017, 31 (05)
  • [4] Multimodal movie genre classification using recurrent neural network
    Behrouzi, Tina
    Toosi, Ramin
    Akhaee, Mohammad Ali
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (04) : 5763 - 5784
  • [5] Multimodal movie genre classification using recurrent neural network
    Tina Behrouzi
    Ramin Toosi
    Mohammad Ali Akhaee
    Multimedia Tools and Applications, 2023, 82 : 5763 - 5784
  • [6] A cover-based method to assess forest characteristics using inventory data and GIS
    Westfall, James A.
    Morin, Randall S.
    FOREST ECOLOGY AND MANAGEMENT, 2013, 298 : 93 - 100
  • [7] Genre Recognition of Artworks using Convolutional Neural Network
    Hosainl, Md Kamran
    Harun-Ur-Rashid
    Taher, Tasnova Bintee
    Rahman, Mohammad Masudur
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [8] A framework for book cover recognition based on automatic location and segmentation
    Liu, Yujie
    Li, Wei
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2012, 24 (11): : 1464 - 1470
  • [9] Music genre classification and recognition using convolutional neural network
    Narkhede N.
    Mathur S.
    Bhaskar A.
    Kalla M.
    Multimedia Tools and Applications, 2025, 84 (4) : 1845 - 1860
  • [10] MF-Net: a multimodal fusion network for emotion recognition based on multiple physiological signals
    Zhu, Lei
    Ding, Yu
    Huang, Aiai
    Tan, Xufei
    Zhang, Jianhai
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)