Private Hugging Face model

Summary

To load a gated or private model from Hugging Face:

Create an access token on your Hugging Face account.
Add the hf_access_token key to your config.yaml secrets and value to your Baseten account.
Add use_auth_token to the appropriate line in model.py.

Example code:

secrets:
  hf_access_token: null

Step-by-step example

BERT base (uncased) is a masked language model that can be used to infer missing words in a sentence.

While the model is publicly available on Hugging Face, we copied it into a gated model to use in this tutorial. The process is the same for using a gated model as it is for a private model.

You can see the code for the finished private model Truss on the right. Keep reading for step-by-step instructions on how to build it.

This example will cover:

Implementing a transformers.pipeline model in Truss
Securely accessing secrets in your model server
Using a gated or private model with an access token

Step 0: Initialize Truss

Get started by creating a new Truss:

truss init private-bert

Give your model a name when prompted, like Private Model Demo. Then, navigate to the newly created directory:

cd private-bert

Step 1: Implement the `Model` class

BERT base (uncased) is a pipeline model, so it is straightforward to implement in Truss.

In model/model.py, we write the class Model with three member functions:

__init__, which creates an instance of the object with a _model property
load, which runs once when the model server is spun up and loads the pipeline model
predict, which runs each time the model is invoked and handles the inference. It can use any JSON-serializable type as input and output.

Read the quickstart guide for more details on Model class implementation.

model/model.py

from transformers import pipeline


class Model:
    def __init__(self, **kwargs) -> None:
        self._secrets = kwargs["secrets"]
        self._model = None

    def load(self):
        self._model = pipeline(
            "fill-mask",
            model="baseten/docs-example-gated-model"
        )

    def predict(self, model_input):
        return self._model(model_input)

Step 2: Set Python dependencies

Now, we can turn our attention to configuring the model server in config.yaml.

BERT base (uncased) has two dependencies:

config.yaml

requirements:
- torch==2.0.1
- transformers==4.30.2

Always pin exact versions for your Python dependencies. The ML/AI space moves fast, so you want to have an up-to-date version of each package while also being protected from breaking changes.

Step 3: Set required secret

Now it’s time to mix in access to the gated model:

Go to the model page on Hugging Face and accept the terms to access the model.
Create an access token on your Hugging Face account.
Add the hf_access_token key and value to your Baseten workspace secret manager.
In your config.yaml, add the key hf_access_token:

config.yaml

secrets:
  hf_access_token: null

Never set the actual value of a secret in the config.yaml file. Only put secret values in secure places, like the Baseten workspace secret manager.

Step 4: Use access token in load

In model/model.py, you can give your model access to secrets in the init function:

model/model.py

def __init__(self, **kwargs) -> None:
        self._secrets = kwargs["secrets"]
        self._model = None

Then, update the load function with use_auth_token:

model/model.py

self._model = pipeline(
    "fill-mask",
    model="baseten/docs-example-gated-model",
    use_auth_token=self._secrets["hf_access_token"]
)

This will allow the pipeline function to load the specified model from Hugging Face.

Step 5: Deploy the model

You’ll need a Baseten API key for this step.

We have successfully packaged a gated model as a Truss. Let’s deploy!

Use --trusted with truss push to give the model server access to secrets stored on the remote host.

truss push --trusted

Wait for the model to finish deployment before invoking.

You can invoke the model with:

truss predict -d '"It is a [MASK] world"'

Getting started

Guides

Examples

Remotes

Private Hugging Face model

Summary

Step-by-step example

Step 0: Initialize Truss

Step 1: Implement the `Model` class

Step 2: Set Python dependencies

Step 3: Set required secret

Step 4: Use access token in load

Step 5: Deploy the model

Getting started

Guides

Examples

Remotes

​Summary

​Step-by-step example

​Step 0: Initialize Truss

​Step 1: Implement the Model class

​Step 2: Set Python dependencies

​Step 3: Set required secret

​Step 4: Use access token in load

​Step 5: Deploy the model

Summary

Step-by-step example

Step 0: Initialize Truss

Step 1: Implement the `Model` class

Step 2: Set Python dependencies

Step 3: Set required secret

Step 4: Use access token in load

Step 5: Deploy the model