Unified Image Aesthetic and Emotional Prediction Based on Deep Multi-task Learning

被引:0
|
作者
Shen Z. [1 ]
Cui C.-R. [2 ]
Dong G.-X. [2 ]
Yu J. [3 ]
Huang J. [1 ]
Yin Y.-L. [1 ]
机构
[1] School of Software, Shandong University, Jinan
[2] School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan
[3] Department of Computer Science and Engineering, Lehigh University, Bethlehem, 18015, PA
来源
Ruan Jian Xue Bao/Journal of Software | 2023年 / 34卷 / 05期
关键词
adaptive feature interaction; aesthetic assessment; deep multi-task learning; emotion analysis; gradient balancing strategy;
D O I
10.13328/j.cnki.jos.006487
中图分类号
学科分类号
摘要
Image aesthetic assessment and emotional analysis aim to enable computers to identify the aesthetic and emotional responses of human beings caused by visual stimulations, respectively. Existing research usually treats them as two independent tasks. However, people’s aesthetic and emotional responses do not appear in isolation. On the contrary, from the perspective of psychological cognition, the two responses are interrelated and mutually influenced. Therefore, this study follows the idea of deep multi-task learning to deal with image aesthetic assessment and emotional analysis under a unified framework and explore their relationship. Specifically, a novel adaptive feature interaction module is proposed to correlate the backbone networks of the two tasks and achieve a unified prediction. In addition, a dynamic feature interaction mechanism is introduced to adaptively determine the degree of feature interaction between the tasks according to the feature dependencies. As the multi-task network updates structural parameters, the study, based on the inconsistency in complexity and convergence speed between the two tasks, proposes a novel gradient balancing strategy to ensure that the network parameters of each task can be smoothly learned under the unified prediction framework. Furthermore, the study constructs a large-scale unified image aesthetic and emotional dataset–UAE. According to the study, UAE is the first image collection containing both aesthetic and emotional labels. Finally, the model and codes of the proposed method as well as the UAE dataset have been released at https://github.com/zhenshenmla/Aesthetic-Emotion-Dataset. © 2023 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2494 / 2506
页数:12
相关论文
共 43 条
  • [1] He KM, Zhang XY, Ren SQ, Sun J., Deep residual learning for image recognition, Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
  • [2] Huang G, Liu Z, van der Maaten L, Weinberger KQ., Densely connected convolutional networks, Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2261-2269, (2017)
  • [3] Cui CR, Shen JL, Nie LQ, Hong RC, Ma J., Augmented collaborative filtering for sparseness reduction in personalized POI recommendation, ACM Trans. on Intelligent Systems and Technology, 8, 5, (2017)
  • [4] Ren SQ, He KM, Girshick R, Sun J., Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. on Pattern Analysis and Machine Intelligence, 39, 6, pp. 1137-1149, (2017)
  • [5] Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL., DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. on Pattern Analysis and Machine Intelligence, 40, 4, pp. 834-848, (2018)
  • [6] Deng YB, Loy CC, Tang XO., Image aesthetic assessment: An experimental survey, IEEE Signal Processing Magazine, 34, 4, pp. 80-106, (2017)
  • [7] Joshi D, Datta R, Fedorovskaya E, Luong QT, Wang JZ, Li J, Luo JB., Aesthetics and emotions in images, IEEE Signal Processing Magazine, 28, 5, pp. 94-115, (2011)
  • [8] Cui CR, Fang HD, Deng X, Nie XS, Dai HS, Yin YL., Distribution-oriented aesthetics assessment for image search, Proc. of the 40th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 1013-1016, (2017)
  • [9] Ren J, Shen XH, Lin Z, Mech R, Foran DJ., Personalized image aesthetics, Proc. of the 2017 IEEE Int’l Conf. on Computer Vision, pp. 638-647, (2017)
  • [10] Zhao SC, Zhao X, Ding GG, Keutzer K., EmotionGAN: Unsupervised domain adaptation for learning discrete probability distributions of image emotions, Proc. of the 26th ACM Int’l Conf. on Multimedia, pp. 1319-1327, (2018)