Stable Diffusion原理以及CAC的应用-Toy模板网

这篇具有很好参考价值的文章主要介绍了Stable Diffusion原理以及CAC的应用。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

前言

添加本文记录

一、How does Stable Diffusion work?

Stable Diffusion基于latent diffusion model(训练模型以生成图像的latent representations(压缩))

latent diffusion三个主要组件:
Stable Diffusion原理以及CAC的应用

An autoencoder (VAE).
VAE模型有编码器，解码器两部分。编码器用于将图像转换为低维潜在表示，作为U-Net模型的输入。解码器反过来，将潜在的表示转换回图像。扩散训练中，使用编码器获取图像的潜表征(latent representations, latents)，用于正向扩散过程，每一步都应用越来越多的噪声。在映射过程中，使用VAE解码器将反向扩散过程产生的去噪latent representations转换回图像。
A U-Net.
U-Net的编码器部分和解码器部分都由ResNet块组成。编码器将图像表示压缩为较低分辨率的图像表示，而解码器将较低分辨率的图像表示解码回原始的高分辨率图像表示。更具体地说，U-Net输出预测噪声残差(difference between the slightly less noisy image and the input image)用于计算预测的去噪图像表示。
为了防止U-Net在下采样时丢失重要信息，通常在编码器的下采样resnet和解码器的上采样resnet之间添加捷径连接。此外，稳定的扩散U-Net能够通过交叉注意层在文本嵌入上调整输出。交叉注意层被添加到U-Net的编码器和解码器部分ResNet块之间。
A text-encoder
文本编码器负责转换输入提示符，例如。“宇航员骑马”编码到U-Net可以理解的嵌入空间。它通常是一个简单的基于Transformer的编码器，将输入tokens序列映射为潜在的文本嵌入序列。

二、使用步骤

pipeline : CycleDiffusionPipeline
scheduler : DDIMScheduler

scheduler.config

FrozenDict([(‘num_train_timesteps’, 1000),
(‘beta_start’, 0.00085),
(‘beta_end’, 0.012),
(‘beta_schedule’, ‘scaled_linear’),
(‘trained_betas’, None),
(‘clip_sample’, False),
(‘set_alpha_to_one’, False),
(‘steps_offset’, 1),
(‘prediction_type’, ‘epsilon’),
(‘_class_name’, ‘PNDMScheduler’),
(‘_diffusers_version’, ‘0.7.0.dev0’),
(‘skip_prk_steps’, True)])

model : UNet2DConditionModel

model.config

FrozenDict([(‘sample_size’, 64),
(‘in_channels’, 4),
(‘out_channels’, 4),
(‘center_input_sample’, False),
(‘flip_sin_to_cos’, True),
(‘freq_shift’, 0),
(‘down_block_types’,
[‘CrossAttnDownBlock2D’,
‘CrossAttnDownBlock2D’,
‘CrossAttnDownBlock2D’,
‘DownBlock2D’]),
(‘mid_block_type’, ‘UNetMidBlock2DCrossAttn’),
(‘up_block_types’,
[‘UpBlock2D’,
‘CrossAttnUpBlock2D’,
‘CrossAttnUpBlock2D’,
‘CrossAttnUpBlock2D’]),
(‘only_cross_attention’, False),
(‘block_out_channels’, [320, 640, 1280, 1280]),
(‘layers_per_block’, 2),
(‘downsample_padding’, 1),
(‘mid_block_scale_factor’, 1),
(‘act_fn’, ‘silu’),
(‘norm_num_groups’, 32),
(‘norm_eps’, 1e-05),
(‘cross_attention_dim’, 768),
(‘attention_head_dim’, 8),
(‘dual_cross_attention’, False),
(‘use_linear_projection’, False),
(‘class_embed_type’, None),
(‘num_class_embeds’, None),
(‘upcast_attention’, False),
(‘resnet_time_scale_shift’, ‘default’),
(‘_class_name’, ‘UNet2DConditionModel’),
(‘_diffusers_version’, ‘0.2.2’),
(‘_name_or_path’, ‘CompVis/stable-diffusion-v1-4’)])

总结

学习使用huggingface上得diffuser仓库提供的各种扩散模型相关论文的模型接口，处理的各种生成任务，如何加载和配置pipeline， schedulers，models。想要看text-guided img2img任务的中间结果，需要客制化采样器
时候还没有调通text-encoder传来的encoder_hidden_states。文章来源地址https://www.toymoban.com/news/detail-505162.html

到了这里，关于Stable Diffusion原理以及CAC的应用的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！