ResMem-Net: memory based deep CNN for image memorability estimation

被引:2
|
作者
Praveen, Arockia [1 ]
Noorwali, Abdulfattah [2 ]
Samiayya, Duraimurugan [3 ]
Khan, Mohammad Zubair [4 ]
Vincent, Durai Raj P. M. [5 ]
Bashir, Ali Kashif [6 ]
Alagupandi, Vinoth [3 ]
机构
[1] Phosphene AI, Madurai, Tamil Nadu, India
[2] Umm Al Qura Univ, Mecca, Saudi Arabia
[3] Optisol Business Solut, Chennai, Tamil Nadu, India
[4] Taibah Univ, Dept Comp Sci, Medina, Saudi Arabia
[5] Vellore Inst Technol, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
[6] Manchester Metropolitan Univ, Manchester, Lancs, England
关键词
Deep Learning; Image Memorability; Visual Emotions; Saliency; Object Interestingness;
D O I
10.7717/peerj-cs.767
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image memorability is a very hard problem in image processing due to its subjective nature. But due to the introduction of Deep Learning and the large availability of data and GPUs, great strides have been made in predicting the memorability of an image. In this paper, we propose a novel deep learning architecture called ResMem-Net that is a hybrid of LSTM and CNN that uses information from the hidden layers of the CNN to compute the memorability score of an image. The intermediate layers are important for predicting the output because they contain information about the intrinsic properties of the image. The proposed architecture automatically learns visual emotions and saliency, shown by the heatmaps generated using the GradRAM technique. We have also used the heatmaps and results to analyze and answer one of the most important questions in image memorability: "What makes an image memorable?''. The model is trained and evaluated using the publicly available Large-scale Image Memorability dataset (LaMem) from MIT. The results show that the model achieves a rank correlation of 0.679 and a mean squared error of 0.011, which is better than the current state-of-the-art models and is close to human consistency (p = 0.68). The proposed architecture also has a significantly low number of parameters compared to the state-of-the-art architecture, making it memory efficient and suitable for production.
引用
收藏
页码:1 / 27
页数:27
相关论文
共 50 条
  • [21] Deep Marginal Fisher Analysis Based CNN for Image Representation and Classification
    Cai, Xun
    Chai, Jiajing
    Gao, Yanbo
    Li, Shuai
    Zhu, Bo
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 181 - 189
  • [22] CNN-based Pedestrian Orientation Estimation from a Single Image
    Kumamoto, Koji
    Yamada, Keiichi
    [J]. PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 13 - 18
  • [23] Discrete Spherical Image Representation for CNN-Based Inclination Estimation
    Shan, Yuhao
    Li, Shigang
    [J]. IEEE ACCESS, 2020, 8 : 2008 - 2022
  • [24] CNN-SCNet: A CNN net-based deep learning framework for infant cry detection in household setting
    Jahangir, Raiyan
    [J]. ENGINEERING REPORTS, 2024, 6 (06)
  • [25] TD-Net:unsupervised medical image registration network based on Transformer and CNN
    Lei Song
    Guixia Liu
    Mingrui Ma
    [J]. Applied Intelligence, 2022, 52 : 18201 - 18209
  • [26] TD-Net:unsupervised medical image registration network based on Transformer and CNN
    Song, Lei
    Liu, Guixia
    Ma, Mingrui
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 18201 - 18209
  • [27] CLDE-Net: crowd localization and density estimation based on CNN and transformer network
    Hu, Yaocong
    Lin, Yuanyuan
    Yang, Huicheng
    Liu, Bingyou
    Wan, Guoyang
    Hong, Jinwen
    Xie, Chao
    Wang, Wei
    Lu, Xiaobo
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (03)
  • [28] CNN-based Euler's Elastica Inpainting with Deep Energy and Deep Image Prior
    Schrader, Karl
    Alt, Tobias
    Weickert, Joachim
    Ertel, Michael
    [J]. 2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2022,
  • [29] EU-net: An automated CNN based ebola U-net model for efficient medical image segmentation
    Rayachoti, Eswaraiah
    Vedantham, Ramachandran
    Gundabatini, Sanjay Gandhi
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (30) : 74323 - 74347
  • [30] Deep BCD-Net Using Identical Encoding-Decoding CNN Structures for Iterative Image Recovery
    Chun, Il Yong
    Fessler, Jeffrey A.
    [J]. PROCEEDINGS 2018 IEEE 13TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2018,