Our blog post describes the latest improvements in Core ML that make it possible to create smaller models and run them faster. This is a step-by-step guide that just focuses on how to convert and run any model in the Hub.
Steps
Install apple/ml-stable-diffusion
This is the package you’ll use to perform the conversion. It’s written in Python, and you can run it on Mac or Linux. If you run on Linux, however, you won’t be able to test the converted models, or compile them to .mlmodelc
format.
Find the model you want
The conversion script will automatically download models from the Hugging Face Hub, so you need to ensure the model is available there. If you have fine-tuned a Stable Diffusion model yourself, you can also use a local path in your filesystem.
An easy way to locate interesting models is browsing the Diffusers Models Gallery.
In this guide we’ll be converting Open Journey v4 by PromptHero.
- Ensure the model is in
diffusers
format. If it’s just a single file with a “checkpoint”, you can use this Space to convert it to diffusers. - If you find a model that is not available in the Hub, consider uploading it for free. All you need is a Hugging Face Hub account, and you can make models public or private for your own use.
Decide your target hardware
The fastest engine and conversion options require beta software. If you want the best experience possible you’ll need to:
- Install coremltools 7.0 beta:
pip install coremltools==7.0b1
-
Install iOS 17 or macOS 14 (Sonoma). Visit developer.apple.com and follow the instructions there.
-
Install Xcode 15 beta, also from Apple Developer
If you don’t want to upgrade your devices or want to distribute your apps to others, you’ll need the latest production versions of these tools.
Run the conversion process using the ORIGINAL
attention implementation
The attention blocks are critical for performance, and there are two main implementations of the algorithm. The problem is that there’s no easy way to be sure what implementation is faster for a particular device, so I recommend you try them both. Some general rules:
SPLIT_EINSUM_V2
is usually faster on iOS/iPadOS devices, and sometimes on high-end models such as M2 computers with lots of neural engine cores.ORIGINAL
is usually faster on M1 Macs.
This is how to run the conversion process. Please, note that some options will depend on whether you are targetting the iOS 17 or macOS betas:
1
2
3
4
5
6
7
8
9
10
11
12
13
python -m python_coreml_stable_diffusion.torch2coreml \
--model-version prompthero/openjourney-v4 \
--convert-unet \
--convert-text-encoder \
--convert-vae-decoder \
--convert-vae-encoder \
--convert-safety-checker \
--quantize-nbits 6 \
--attention-implementation ORIGINAL \
--compute-unit CPU_AND_GPU \
--bundle-resources-for-swift-cli \
--check-output-correctness \
-o models/openjourney-6-bit/original
2
Use the Hub model id you want to convert6
Optional, only if you want to use input images (in-painting, image-to-image tasks)8
Requires beta software. Use--chunk-unet
instead if you don’t use it.10
ORIGINAL
implementation runs on CPU and GPU, but not on the Neural Engine (ANE)11
Requires conversion on a Mac12
Requires conversion on a Mac13
Destination folder
Run the conversion process using the SPLIT_EINSUM_V2
attention implementation
As mentioned above, this attention implementation is able to use the Neural Engine in addition to the GPU, and is usuallly the best choice for iOS devices. It may also be faster on Macs, especially on high-end ones.
1
2
3
4
5
6
7
8
9
10
11
12
13
python -m python_coreml_stable_diffusion.torch2coreml \
--model-version prompthero/openjourney-v4 \
--convert-unet \
--convert-text-encoder \
--convert-vae-decoder \
--convert-vae-encoder \
--convert-safety-checker \
--quantize-nbits 6 \
--attention-implementation SPLIT_EINSUM_V2 \
--compute-unit ALL \
--bundle-resources-for-swift-cli \
--check-output-correctness \
-o models/openjourney-6-bit/split_einsum_v2
2
Use the Hub model id you want to convert6
Optional, only if you want to use input images (in-painting, image-to-image tasks)8
Requires beta software. Use--chunk-unet
instead if you don’t use it.10
SPLIT_EINSUM_V2
runs on all available devices (CPU, GPU, Neural Engine)11
Requires conversion on a Mac12
Requires conversion on a Mac13
Destination folder
Understanding the conversion artifacts
Once you run the two conversion commands, the output folder will have the following structure:
openjourney-6-bit
├── original
│ ├── Resources
│ ├── Stable_Diffusion_version_prompthero_openjourney-v4_safety_checker.mlpackage
│ ├── Stable_Diffusion_version_prompthero_openjourney-v4_text_encoder.mlpackage
│ ├── Stable_Diffusion_version_prompthero_openjourney-v4_unet.mlpackage
│ ├── Stable_Diffusion_version_prompthero_openjourney-v4_vae_decoder.mlpackage
│ └── Stable_Diffusion_version_prompthero_openjourney-v4_vae_encoder.mlpackage
└── split_einsum_v2
├── Resources
├── Stable_Diffusion_version_prompthero_openjourney-v4_safety_checker.mlpackage
├── Stable_Diffusion_version_prompthero_openjourney-v4_text_encoder.mlpackage
├── Stable_Diffusion_version_prompthero_openjourney-v4_unet.mlpackage
├── Stable_Diffusion_version_prompthero_openjourney-v4_vae_decoder.mlpackage
└── Stable_Diffusion_version_prompthero_openjourney-v4_vae_encoder.mlpackage
- The
mlpackage
files are the Core ML versions of each component of the Stable Diffusion model. These files are suitable for integration in a native app, or to run inference using Python we’ll see below. - The
Resources
folders contain the compiled versions of the same files, wich the extension.mlmodelc
. These are suitable for download in a native app, or to run inference with Swift.
Use the command-line tools to verify conversion
This requires a Mac, because you need Apple’s CoreML
framework in order to run Core ML models. The first time you run inference it will take several minutes, as CoreML
will compile the models (if necessary), analyze them and decide what compute engines to use for best performance. Subsequent runs will be much faster as the planning phase will be cached.
To use the Python CLI, use the -i
argument to indicate the location where all the mlpackage
files reside. Be sure to use the same model-version
you used when converting:
python -m python_coreml_stable_diffusion.pipeline \
--model-version prompthero/openjourney-v4 \
--prompt "a photo of an astronaut riding a horse on mars" \
-i models/openjourney-6-bit/split_einsum_v2 \
--compute-unit CPU_AND_NE \
-o output/split_einsum_v2 \
--seed 43
There’s also a Swift CLI that is similar to the Python one. In this case, you need to point it to the location where the compiled Resources
are, and it’s not necessary to indicate what the original model version was. In this example we’ll test the ORIGINAL
variant:
swift run StableDiffusionSample \
"a photo of an astronaut riding a horse on mars" \
--resource-path models/openjourney-6-bit/original/Resources \
--compute-units cpuAndGPU \
--output-path . \
--seed 43
The first time you run the swift CLI, it will be compiled for you. Use --help
to display a list of all the options you can use.
Upload Core ML models to the Hub
We’ll perform the following actions:
- Rename the
Resources
foldercompiled
. - Put all the
.mlpackage
files inside a folder calledpackages
. - Create two
zip
archives with the contents of each one of thecompiled
folders. These archives are useful for third-party apps. - Create a new model in the Hub and upload all these contents to it. You can also open a pull request to the original repo instead of creating a new one.
- Don’t forget to create a model card (a
README.md
file) to acknowlege the original authors and describe the contents of the model. - Add the
core-ml
tag to the YAML section of the model card. This is important for search and discoverability!
This is the file structure in my filesystem, just before upload:
openjourney-6-bit
├── README.md
├── coreml-prompthero-openjourney-v4-palettized_original_compiled.zip
├── coreml-prompthero-openjourney-v4-palettized_split_einsum_compiled.zip
├── original
│ ├── compiled
│ └── packages
└── split_einsum_v2
├── compiled
└── packages
And this is the repo I uploaded everything to.
Use the Core ML models
To run the models in a third party app, you’ll typically need to configure the app to download any of the zip files we created earlier. Some app examples are Swift Diffusers or Mochi Diffusion, which was initially based on the former.
If you want to integrate the models in your own app, you can drag and drop the .mlpackage
files in your Xcode project. Xcode will then compile the models and bundle them as part of your application. You can write your own code for inference, or you can add apple/ml-stable-diffusion
as a package dependency to use the StableDiffusion
Swift module.
Feedback
Want to suggest improvements or fixes to this documentation? Visit the repo to report an issue or open a PR :)