You can package any model as a Truss.
truss.create()is a convenient shortcut for packaging in-memory models built in supported frameworks, but the manual approach gives control and flexibility throughout the entire model packaging and deployment process.
This doc walks through the process of manually creating a Truss, using Stable Diffusion v1.5 as an example.
To get started, initialize the Truss with the following command in the CLI:
truss init sd_truss
This will create the following file structure:
sd_truss/ # Truss root
data/ # Stores serialized models/weights/binaries
model.py # Implements Model class
packages/ # Stores utility code for model.py
config.yaml # Config for model serving environment
examples.yaml # Invocation examples
Most of our development work will happen in
models/model.py, the first function you'll need to implement is
When the model is spun up to receive requests,
load()is called exactly once and is guaranteed to finish before any predictions are attempted.
The purpose of
load()is to set a value for
self._model. This requires deserializing your model or otherwise loading in your model weights.
Example: Stable Diffusion 1.5
The exact code you'll need will depend on your model and framework. In this example, model weights for Stable Diffusion 1.5 are coming from the HuggingFace
This requires a couple of imports (don't worry, we'll cover adding Python requirements in a bit).
from dataclasses import asdict
from typing import Dict
from diffusers import EulerDiscreteScheduler, StableDiffusionPipeline
The load function looks like:
scheduler = EulerDiscreteScheduler.from_pretrained(
self._model = StableDiffusionPipeline.from_pretrained(
self._model = self._model.to("cuda")
self._modelcould be set using weights from anywhere. If you have custom weights, you can load them from your Truss'
data/directory by following this guide.
The other key function in your Truss is
predict(), which handles model invocation.
As our loaded model is a
StableDiffusionPipelineobject, model invocation is pretty simple:
def predict(self, model_input: Dict):
response = self._model(**model_input)
All we have to do is pass the model input to the model.
But how do we make sure the model input is a valid format, and that the model output is usable?
By default, pre- and post-processing functions are passthroughs. But if needed, you can implement these functions to make your model input and output match the specification of whatever app or API you're building.
There are more in-depth docs on processing functions here, but here's sample code for the Stable Diffusion example, which needs a postprocessing function but not a pre-processing function:
def postprocess(self, model_output: Dict) -> Dict:
# Convert to base64
model_output["images"] = [pil_to_b64(img) for img in model_output["images"]]
Eagle-eyed readers will note that
pil_to_b64()is not a function that has been defined anywhere. How can we use it?
pil_to_b64()function from the last step:
from io import BytesIO
from PIL import Image
buffered = BytesIO()
img_str = base64.b64encode(buffered.getvalue())
return "data:image/png;base64," + str(img_str)[2:-1]
You could just paste this into
models/model.pyand call it a day. But its better to factor out helper functions and utilities so that they can be re-used between multiple Trusses.
Let's create a folder
sharedat the same level as our root
sd_trussdirectory (don't create it inside the Truss directory). Then create a file
shared/base64_utils.py. It should look like this:
Paste the code from above into
Let your Truss know where to look for external packages with the following lines in
Note that this is an array in yaml; your Truss can depend on multiple external directories for packages.
Finally, at the top of
from base64_utils import pil_to_b64
This will import your function from your external directory.
For more details on bundled and shared packages, see this demo repository and the bundled packages docs.
Now, we switch our attention to
config.yaml. You can use this file to customize a great deal about your packaged model — here's a complete reference — but right now we just care about setting our Python requirements up so the model can run.
For that, find
requirements:in the config file. In the Stable Diffusion 1.5 example, we set it to:
These requirements work just like
requirements.txtin a Python project, and you can pin versions with
Large models like Stable Diffusion require powerful hardware to run invocations. Set your packaged model's hardware requirements in
accelerator: A10G # Type of GPU required
cpu: "8" # Number of vCPU cores required
memory: 30Gi # Mibibytes (Mi) or Gibibytes (Gi) of RAM required
use_gpu: true # If false, set accelerator: null
You've successfully packaged a model! If you have the required hardware, you can test it locally, or deploy it to Baseten to get a draft model for rapid iteration in a production environment.