SAT: Self-Attention Control for Diffusion Models Training

被引:0
|
作者
Huang, Jing [1 ]
Zhang, Tianyi [1 ]
Shi, Wei [1 ]
机构
[1] Huawei Singapore Res Ctr, Singapore, Singapore
关键词
text-to-image diffusion model; LoRA; attention mask control; training strategy; GAN;
D O I
10.1145/3607827.3616838
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent text-to-image diffusion models show outstanding performance in generating high-quality images conditioned on textual prompts. However, a persistent challenge lies in the generation of detailed images, especially human-related images, which often exhibit distorted faces and eyes. Existing approaches to address this issue either involve the utilization of more specific yet lengthy prompts or the direct application of restoration tools to the generated image. Besides, a few studies have shown that attention maps can enhance diffusion models' stability by guiding intermediate samples during the inference process. In this paper, we propose a novel training strategy (SAT) to improve the sample quality during the training process. To address this issue in a straightforward manner, we introduce blur guidance as a solution to refine intermediate samples, enabling diffusion models to produce higher-quality outputs with a moderate ratio of control. Improving upon this, SAT leverages the intermediate attention maps of diffusion models to further improve training sample quality. Specifically, SAT adversarially blurs only the regions that diffusion models attend to and guide them during the training process. We examine and compare both cross-attention mask control (CAC) and self-attention mask control (SAC) based on stable diffusion (SD) V-1.5, and our results show that our method under SAC (i.e SAT) improves the performance of stable diffusion.
引用
收藏
页码:15 / 22
页数:8
相关论文
共 50 条
  • [1] Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
    Hong, Susung
    Lee, Gyuseong
    Jang, Wooseok
    Kim, Seungryong
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7428 - 7437
  • [2] Synthesizer: Rethinking Self-Attention for Transformer Models
    Tay, Yi
    Bahri, Dara
    Metzler, Donald
    Juan, Da-Cheng
    Zhao, Zhe
    Zheng, Che
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7192 - 7203
  • [3] Attentional control and the self: The Self-Attention Network (SAN)
    Humphreys, Glyn W.
    Sui, Jie
    [J]. COGNITIVE NEUROSCIENCE, 2016, 7 (1-4) : 5 - 17
  • [4] Training a popular Mahjong agent with CNN and self-attention
    Liu, Liu
    Zhang, XiaoChuan
    He, ZeYa
    Liu, Jie
    [J]. INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2024, 19 (02) : 157 - 166
  • [5] Continuous Self-Attention Models with Neural ODE Networks
    Zhang, Jing
    Zhang, Peng
    Kong, Baiwen
    Wei, Junqiu
    Jiang, Xin
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14393 - 14401
  • [6] Theoretical Limitations of Self-Attention in Neural Sequence Models
    Hahn, Michael
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 156 - 171
  • [7] ATTENTIONLITE: TOWARDS EFFICIENT SELF-ATTENTION MODELS FOR VISION
    Kundu, Souvik
    Sundaresan, Sairam
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2225 - 2229
  • [8] Stand-Alone Self-Attention in Vision Models
    Ramachandran, Prajit
    Parmar, Niki
    Vaswani, Ashish
    Bello, Irwan
    Levskaya, Anselm
    Shlens, Jonathon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [9] Commentary: Attentional control and the self: The Self-Attention Network (SAN)
    Garcia, Adolfo M.
    Huepe, David
    Martinez, David
    Morales, Juan P.
    Huepe, Daniela
    Hurtado, Esteban
    Calvo, Noelia
    Ibanez, Agustin
    [J]. FRONTIERS IN PSYCHOLOGY, 2015, 6
  • [10] SAT-Net: Self-Attention and Temporal Fusion for Facial Action Unit Detection
    Li, Zhihua
    Zhang, Zheng
    Yin, Lijun
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5036 - 5043