from transformers import pipeline


class Model:
    def __init__(self, **kwargs) -> None:
        self._secrets = kwargs["secrets"]
        self._model = None

    def load(self):
        self._model = pipeline(
            "fill-mask",
            model="baseten/docs-example-gated-model",
            use_auth_token=self._secrets["hf_access_token"],
        )

    def predict(self, model_input):
        return self._model(model_input)

View on Github

In this example, we build a Truss that uses a model that requires Hugging Face authentication. The steps for loading a model from Hugging Face are:

Create an access token on your Hugging Face account.
Add the `hf_access_token“ key to your config.yaml secrets and value to your Baseten account.
Add use_auth_token when creating the actual model.

Setting up the model

In this example, we use a private version of the BERT base model. The model is publicly available, but for the purposes of our example, we copied it into a private model repository, with the path “baseten/docs-example-gated-model”.

First, like with other Hugging Face models, start by importing the pipeline function from the transformers library, and defining the Model class.

model/model.py

from transformers import pipeline


class Model:

An important step in loading a model that requires authentication is to have access to the secrets defined for this model. We pull these out of the keyword args in the __init__ function.

model/model.py

    def __init__(self, **kwargs) -> None:
        self._secrets = kwargs["secrets"]
        self._model = None

    def load(self):

Ensure that when you define the pipeline, we use the use_auth_token parameter, pass the hf_access_token secret that is on our Baseten account.

model/model.py

        self._model = pipeline(
            "fill-mask",
            model="baseten/docs-example-gated-model",
            use_auth_token=self._secrets["hf_access_token"],
        )

    def predict(self, model_input):
        return self._model(model_input)

Setting up the config.yaml

The main things that need to be set up in the config are requirements, which need to include Hugging Face transformers, and the secrets.

config.yaml

environment_variables: {}
model_name: private-model
python_version: py39
requirements:
- torch==2.0.1
- transformers==4.30.2
resources:
  cpu: "1"
  memory: 2Gi
  use_gpu: false
  accelerator: null

To make the hf_access_token available in the Truss, we need to include it in the config. Setting the value to null here means that the value will be set by the Baseten secrets manager.

config.yaml

secrets:
  hf_access_token: null
system_packages: []

Deploying the model

An important note for deploying models with secrets is that you must use the --trusted flag to give the model access to secrets stored on the remote secrets manager.

$ truss push --trusted

After the model finishes deploying, you can invoke it with:

$ truss predict -d '"It is a [MASK] world"'

Fast Cold Starts with Cached Weights Model with system packages

from transformers import pipeline


class Model:
    def __init__(self, **kwargs) -> None:
        self._secrets = kwargs["secrets"]
        self._model = None

    def load(self):
        self._model = pipeline(
            "fill-mask",
            model="baseten/docs-example-gated-model",
            use_auth_token=self._secrets["hf_access_token"],
        )

    def predict(self, model_input):
        return self._model(model_input)

Getting started

Guides

Examples

Remotes

Private Hugging Face Model

View on Github

Setting up the model

Setting up the config.yaml

Deploying the model

Getting started

Guides

Examples

Remotes

View on Github

​Setting up the model

​Setting up the config.yaml

​Deploying the model

Setting up the model

Setting up the config.yaml

Deploying the model