pip install diffusers
The following code is taken directly from https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion_2
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler import torch repo_id = "stabilityai/stable-diffusion-2-base" pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float16, revision="fp16")The last line uses half precision which is only available on Nvidia GPUs. So you need to change it to
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float32)The code goes on with
The last line requires a Nvidia or AMD GPU (with rocm installed). You can omit it to use the CPU but on Apple silicon devices you can change cuda to mps to use the GPU.pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config) pipe = pipe.to("cuda")
pipe = pipe.to("mps")The example then finishes with
To get more parameters for the pipe call in the second last line see https://huggingface.co/docs/diffusers/v0.15.0/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__prompt = "High quality photo of an astronaut riding a horse in space" image = pipe(prompt, guidance_scale=9, num_inference_steps=25).images image.save("astronaut.png")
Unfortunately the resolution is not in the parameter list. This seems to be part of the model. To generate 768x768 pixel images you have to switch the repo_id to stable-diffusion-2. The example above uses 512x512.
If you want to use other models from huggingface.co just change the repo_id.
Please note that models that generate larger images need much more time to generate an image. As you need several tries until you get the image you want, this extra time sums up. To get around this you can use upscalers. One is a x2 upscaler that can turn 512x512 to 1024x1024 and another one is a x4 upscaler which can turn 512x512 in 2048x2048. Of course the x4 needs much more resources and such I have only tested the x2.
Here is some sample code that will generate a image and scale it up:
from diffusers import DiffusionPipeline, DPMSolverMultistepSchedulerI have also tried upsaling the upsacled_image. While the inferencing steps were quite fast I got an out of memory error on any computer with less than 64GB memory and the post processing (after the inferencing steps) took very long.
repo_id = "stabilityai/stable-diffusion-2-base"
pipe = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.float32, revision="fp16")
prompt = "High quality photo of an astronaut riding a horse in space"
image = pipe(prompt, num_inference_steps=25).images
from diffusers import StableDiffusionLatentUpscalePipeline, StableDiffusionPipeline
upscaler = StableDiffusionLatentUpscalePipeline.from_pretrained("
upscaled_image = upscaler( prompt=prompt, image=image, num_inference_steps=20).images
While speaking about performance. My MacBook M1 beats the CPU image generation by 40 seconds vs. 2 minutes. This is great but 40 seconds can still get very long. So when I've learned that starting by macOS 13.1 you can convert the models to use the Neural Enginge I wanted to give it a try.
I've cloned the code from https://github.com/apple/ml-stable-diffusion. In the Readme it states that you have to download the models. The good news is that if you have executed the previous python code you already have the models.
See ~/.cache/huggingface/hub. For the example above the model was in models--stabilityai--stable-diffusion-2-base. There is a snapshots directory containing directories with 40 hexedecimal characters. If you have run the exmaple only once there will be only one directory. In my case it was ~/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-base/snapshots/1cb61502fc8b634cdb04e7cd69e06051a728bedf.
So my call to convert the models was
python -m python_coreml_stable_diffusion.torch2coreml --convert-unet --convert-text-encoder --convert-vae-decoder --convert-safety-checker -o ../conv_output --bundle-resources-for-swift-cliPlease note the parameter --bundle-resources-for-swift-cli. This is necessary to use the image generation with Swift. You want to use swift as the python code tooks a very long time to load a model on each model loading while the swift tool only does this for the very first one. Later on the image is loaded very fast und the image generation drops to about 25 seconds.
So using the Neural Enginge gave me a good boost.
Here is a sample to run the swift tool:
swift run StableDiffusionSample "High quality photo of an astronaut riding a horse in space" --resource-path ../conv_output/Resources/
--step-count 35 --disable-safety --image-count 4 --output-path ../generated_images
A final tip: DiffusionPipeline.from_pretrained always connects to huggingface.co to check for updates. If you don't want this you can set the local path as repo_id. See this example:
from os.path import expanduser
repo_id = expanduser("~")+"/.cache/huggingface/hub/models--stabilityai--stable-diffusion-2-base/snapshots/1cb61502fc8b634cdb04e7cd69e06051a728bedf"