Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images

被引：0

作者：

Zheng, Qingping ^{[1
,2
]}

Guo, Yuanfan ^{[2
]}

Deng, Jiankang ^{[2
]}

Han, Jianhua ^{[2
]}

Li, Ying ^{[1
]}

Xu, Songcen ^{[2
]}

Xu, Hang ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Xian, Peoples R China

[2] Huawei Noahs Ark Lab, Shanghai, Peoples R China

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Stable diffusion, a generative model used in text-to-image synthesis, frequently encounters resolution-induced composition problems when generating images of varying sizes. This issue primarily stems from the model being trained on pairs of single-scale images and their corresponding text descriptions. Moreover, direct training on images of unlimited sizes is unfeasible, as it would require an immense number of text-image pairs and entail substantial computational expenses. To overcome these challenges, we propose a twostage pipeline named Any-Size-Diffusion (ASD), designed to efficiently generate well-composed HD images of any size, while minimizing the need for high-memory GPU resources. Specifically, the initial stage, dubbed Any Ratio Adaptability Diffusion (ARAD), leverages a selected set of images with a restricted range of ratios to optimize the text-conditional diffusion model, thereby improving its ability to adjust composition to accommodate diverse image sizes. To support the creation of images at any desired size, we further introduce a technique called Fast Seamless Tiled Diffusion (FSTD) at the subsequent stage. This method allows for the rapid enlargement of the ASD output to any high-resolution size, avoiding seaming artifacts or memory overloads. Experimental results on the LAION-COCO and MM-CelebA-HQ benchmarks show that ASD can produce well-structured images of arbitrary sizes, cutting down the inference time by 2x compared to the traditional tiled algorithm. The source code is available at https://github.com/ProAirVerse/Any- Size- Diffusion.

引用

页码：7571 / 7578

页数：8