stable-diffusion 预训练模型汇总

这篇具有很好参考价值的文章主要介绍了stable-diffusion 预训练模型汇总。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

目前各个github上各个库比较杂乱,故此做些整理方便查询

Stable UnCLIP 2.1

New stable diffusion finetune (Stable unCLIP 2.1, Hugging Face) at 768x768 resolution, based on SD2.1-768.

This model allows for image variations and mixing operations as described in Hierarchical Text-Conditional Image Generation with CLIP Latents, and, thanks to its modularity, can be combined with other models such as KARLO.

Comes in two variants:
sd21-unclip-l.ckpt :
conditioned on CLIP ViT-L and ViT-H image embeddings
sd21-unclip-h.ckpt:
conditioned on CLIP ViT-L and ViT-H image embeddings

Instructions are available here.
stable-diffusion 预训练模型汇总,stable diffusion

Version 2.1

New stable diffusion model (Stable Diffusion 2.1-v) at 768x768 resolution and (Stable Diffusion 2.1-base) at 512x512 resolution, both based on the same number of parameters and architecture as 2.0 and fine-tuned on 2.0, on a less restrictive NSFW filtering of the LAION-5B dataset.

Per default, the attention operation of the model is evaluated at full precision when xformers is not installed. To enable fp16 (which can cause numerical instabilities with the vanilla attention module on the v2.1 model) , run your script with ATTN_PRECISION=fp16 python <thescript.py>文章来源地址https://www.toymoban.com/news/detail-527704.html

Version 2.0

  • New stable diffusion model (Stable Diffusion 2.0-v) at 768x768 resolution. Same number of parameters in the U-Net as 1.5, but uses OpenCLIP-ViT/H as the text encoder and is trained from scratch. SD 2.0-v is a so-called v-prediction model.
  • The above model is finetuned from SD 2.0-base(512-base-ema.ckpt), which was trained as a standard noise-prediction model on 512x512 images and is also made available.
  • Added a x4 upscaling latent text-guided diffusion model.
  • New depth-guided stable diffusion model, finetuned from SD 2.0-base. The model is conditioned on monocular depth estimates inferred via MiDaS and can be used for structure-preserving img2img and shape-conditional synthesis.
  • A text-guided inpainting model, finetuned from SD 2.0-base.

Version 1

  • sd-v1-1.ckpt:
    237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
  • sd-v1-2.ckpt:
    Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).
  • sd-v1-3.ckpt:
    Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
  • sd-v1-4.ckpt:
    Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
  • sd-v1-5.ckpt:
    Resumed from sd-v1-2.ckpt. 595k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
  • sd-v1-5-inpainting.ckpt:
    Resumed from sd-v1-5.ckpt. 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.

Inpainting

  • 512-inpainting-ema.ckpt
    Resumed from 512-base-ema.ckpt and trained for another 200k steps. Follows the mask-generation strategy presented in LAMA which, in combination with the latent VAE representations of the masked image, are used as an additional conditioning. The additional input channels of the U-Net which process this extra information were zero-initialized. The same strategy was used to train the 1.5-inpainting checkpoint.
  • sd-v1-5-inpainting.ckpt
    sd-v1-5-inpaint.ckpt: Resumed from sd-v1-2.ckpt. 595k steps at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning to improve classifier-free guidance sampling. Then 440k steps of inpainting training at resolution 512x512 on “laion-aesthetics v2 5+” and 10% dropping of the text-conditioning. For inpainting, the UNet has 5 additional input channels (4 for the encoded masked-image and 1 for the mask itself) whose weights were zero-initialized after restoring the non-inpainting checkpoint. During training, we generate synthetic masks and in 25% mask everything.

到了这里,关于stable-diffusion 预训练模型汇总的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 【stable-diffusion使用扩展+插件和模型资源(下)】

    插件模型魔法图片等资源:https://tianfeng.space/1240.html 1.lobe theme lobe theme是一款主题插件,直接可以在扩展安装 界面进行了重新布局,做了一些优化,有兴趣的可以下载试试,右上角设置按钮,第一行就是语言设置,还有颜色等等 2.SadTalker https://github.com/OpenTalker/SadTalker 基于最

    2024年02月11日
    浏览(69)
  • 大模型学习笔记(一):部署ChatGLM模型以及stable-diffusion模型

    平台注册链接: https://growthdata.virtaicloud.com/t/SA 注册完成后,点击右上角: 费用中心 ,可查看领取的算力。 https://platform.virtaicloud.com/ ChatGLM3 是智谱AI和清华大学 KEG 实验室联合发布的新一代对话预训练模型。 推理速度比上一代提高了很多,虽然本教程有两种启动方式,但教

    2024年03月19日
    浏览(61)
  • diffusers加速文生图速度;stable-diffusion、PixArt-α模型

    参考: https://pytorch.org/blog/accelerating-generative-ai-3/ https://colab.research.google.com/drive/1jZ5UZXk7tcpTfVwnX33dDuefNMcnW9ME?usp=sharing#scrollTo=jueYhY5YMe22 大概GPU资源8G-16G;另外模型资源下载慢可以在国内镜像:https://aifasthub.com/ 1、加速代码 能加速到2秒左右

    2024年04月23日
    浏览(71)
  • 【stable-diffusion史诗级讲解+使用+插件和模型资源】

    前言 以后所有资源模型,都在这个网址公布:https://tianfeng.space/1240.html 要不然东一个西一个难找麻烦 安装:stable diffusion 小白最全详细使用教程 模型最终版:https://blog.csdn.net/weixin_62403633/article/details/131089616?spm=1001.2014.3001.5501 如果安装或者使用有问题,欢迎评论区留言,CSD

    2024年02月09日
    浏览(62)
  • stable-diffusion的webui和comfyuig共享模型路径

     1.修改上图extra_model_paths.yaml.example为extra_model_paths.yaml 2.将base_path: path/to/stable-diffusion-webui/修改成你的/stable-diffusion-webui安装路径(例如:base_p: D:/stable-diffusion-webui) 做个记录

    2024年02月04日
    浏览(55)
  • AI绘画 | stable-diffusion的模型简介和下载使用

    我们下载完stable-diffusion-ui后还需要下载需要的大模型,才能进行AI绘画的操作。秋叶的stable-diffusion-ui整合包内,包含了anything-v5-PrtRE.safetensors和Stable Diffusion-V1.5-final-prune_v1.0.ckpt两个模型。 anything-v5-PrtRE.safetensors模型可以用于生成多种类型的图像,包括肖像、风景、动物、卡通

    2024年02月04日
    浏览(87)
  • 关于【Stable-Diffusion WEBUI】基础模型对应VAE的问题

    本篇主要提及模型的VAE,通常情况下我们不用考虑它,但是有些特别是早期模型并没有整合VAE…… 更多不断丰富的内容参考:🔗 《继续Stable-Diffusion WEBUI方方面面研究(内容索引)》 VAE (Variational Auto-Encoder 变分自动编码器) 虽然简称是编码器,实际上是编码解码器(我们用到

    2024年02月09日
    浏览(55)
  • 用云服务器构建gpt和stable-diffusion大模型

    参考: DataWhale学习手册链接 采用云服务器创建项目时,选择平台预先下载的 镜像、数据和模型 往往可以事半功倍。 镜像 是一个包含了操作系统、软件、库以及其他运行时需要的所有内容的 快照 。使用镜像可以快速部署具有相同环境配置的虚拟机实例或容器,无需手动配

    2024年03月24日
    浏览(38)
  • Stable-Diffusion|入门怎么下载与使用civitai网站的模型(二)

    C站:https://civitai.com/ 上一篇安装:Stable-Diffusion|window10安装GPU版本的 Stable-Diffusion-WebUI遇到的一些问题(一) 先贴几张笔者自己实验的图,模型来自:photo-style-cnvtubermix 咒语prompt: 反向咒语: 左上角是主模型,一般是 CHECKPOINT MERGE ,所以如下图,这类就是主模型,需要放到主

    2024年02月08日
    浏览(45)
  • 闲谈【Stable-Diffusion WEBUI】的插件:模型工具箱:省空间利器

    本篇主要提到WEBUI的一个新插件,模型工具箱,可以修剪模型,提取修改模型组件信息。 更多不断丰富的内容参考:🔗 《继续Stable-Diffusion WEBUI方方面面研究(内容索引)》 参考:https://github.com/arenasys/stable-diffusion-webui-model-toolkit 可以从WEB UI中直接安装,它是一个用于管理、编

    2024年02月11日
    浏览(75)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包