An Efficient Transformer Based on Global and Local Self-Attention for Face Photo-Sketch Synthesis

被引:7
|
作者
Yu, Wangbo [1 ]
Zhu, Mingrui [1 ]
Wang, Nannan [1 ]
Wang, Xiaoyu [2 ]
Gao, Xinbo [3 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, State Key Lab Integrated Serv Networks, Xian 710071, Shaanxi, Peoples R China
[2] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China
[3] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China
基金
中国国家自然科学基金;
关键词
Face photo-sketch synthesis; transformer; global self-attention; local self-attention; generative adversarial networks (GANs);
D O I
10.1109/TIP.2022.3229614
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Face photo-sketch synthesis tasks have been dominated by convolutional neural networks (CNNs), especially CNN-based generative adversarial networks (GANs), because of their strong texture modeling capabilities and thus their ability to generate more realistic face photos/sketches beyond traditional methods. However, due to CNNs' locality and spatial invariance properties, there have weaknesses in capturing the global and structural information which are extremely important for face images. Inspired by the recent phenomenal success of the Transformer in vision tasks, we propose replacing CNNs with Transformers that are able to model long-range dependencies to synthesize more structured and realistic face images. However, the existing vision Transformers are mainly designed for high-level vision tasks and lack the dense prediction ability to generate high resolution images due to the quadratic computational complexity of their self-attention mechanism. In addition, the original Transformer is not capable of modeling local correlations which is an important skill for image generation. To address these challenges, we propose two types of memory-friendly Transformer encoders, one for processing local correlations via local self-attention and another for modeling global information via global self-attention. By integrating the two proposed Transformer encoders, we present an efficient GL-Transformer for face photo-sketch synthesis, which can synthesize realistic face photo/sketch images from coarse to fine. Extensive experiments demonstrate that our model achieves a comparable or better performance beyond the state-of-the-art CNN-based methods both qualitatively and quantitatively.
引用
收藏
页码:483 / 495
页数:13
相关论文
共 50 条
  • [1] A Sketch-Transformer Network for Face Photo-Sketch Synthesis
    Zhu, Mingrui
    Liang, Changcheng
    Wang, Nannan
    Wang, Xiaoyu
    Li, Zhifeng
    Gao, Xinbo
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 1352 - 1358
  • [2] Multi-Scale Gradients Self-Attention Residual Learning for Face Photo-Sketch Transformation
    Duan, Shuchao
    Chen, Zhenxue
    Wu, Q. M. Jonathan
    Cai, Lei
    Lu, Dan
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 1218 - 1230
  • [3] Face Photo-Sketch Synthesis and Recognition
    Wang, Xiaogang
    Tang, Xiaoou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (11) : 1955 - 1967
  • [4] Face Photo-Sketch Recognition using Local and Global Texture Descriptors
    Galea, Christian
    Farrugia, Reuben A.
    [J]. 2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, : 2240 - 2244
  • [5] A Simple Framework for Face Photo-Sketch Synthesis
    Li, Xuewei
    Cao, Xiaochun
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2012, 2012
  • [6] Knowledge Distillation for Face Photo-Sketch Synthesis
    Zhu, Mingrui
    Li, Jie
    Wang, Nannan
    Gao, Xinbo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 893 - 906
  • [7] Anchored Neighborhoods Search Based on Global Dictionary Atoms for Face Photo-Sketch Synthesis
    Liu, Feng
    Xu, Ran
    Zheng, Jieying
    Lin, Qiuli
    Gan, Zongliang
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2019), 2019, 11179
  • [8] Face Sketch Synthesis From a Single Photo-Sketch Pair
    Zhang, Shengchuan
    Gao, Xinbo
    Wang, Nannan
    Li, Jie
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (02) : 275 - 287
  • [9] Face Photo-Sketch Synthesis via Knowledge Transfer
    Zhu, Mingrui
    Wang, Nannan
    Gao, Xinbo
    Li, Jie
    Li, Zhifeng
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 1048 - 1054
  • [10] A Deep Collaborative Framework for Face Photo-Sketch Synthesis
    Zhu, Mingrui
    Li, Jie
    Wang, Nannan
    Gao, Xinbo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (10) : 3096 - 3108