from transformers import pipeline


class Model:
    def __init__(self, **kwargs) -> None:
        self._model = None

    def load(self):
        self._model = pipeline(
            "document-question-answering",
            model="impira/layoutlm-document-qa",
        )

    def predict(self, model_input):
        return self._model(model_input["url"], model_input["prompt"])

View on Github

In this example, we build a Truss with a model that requires specific system packages.

To add system packages to your Truss, you can add a system_packages key to your config.yaml file, for instance: To add system packages to your model serving environment, open config.yaml and update the system_packages key with a list of apt-installable Debian packages:

config.yaml
system_packages:
 - tesseract-ocr

For this example, we use the LayoutLM Document QA model, a multimodal model that answers questions about provided invoice documents. This model requires a system package, tesseract-ocr, which needs to be included in the model serving environment.

Setting up the model.py

For this model, we use the HuggingFace transformers library, and the document-question-answering task.

model/model.py
from transformers import pipeline


class Model:
    def __init__(self, **kwargs) -> None:
        self._model = None

    def load(self):
        self._model = pipeline(
            "document-question-answering",
            model="impira/layoutlm-document-qa",
        )

    def predict(self, model_input):
        return self._model(model_input["url"], model_input["prompt"])

Setting up the config.yaml file

The main items that need to be configured in the config.yaml file are requirements and system_packages sections.

config.yaml
environment_variables: {}
external_package_dirs: []
model_metadata:
    example_model_input: {"url": "https://templates.invoicehome.com/invoice-template-us-neat-750px.png", "prompt": "What is the invoice number?"}
model_name: LayoutLM Document QA
python_version: py39

Specify the versions of the Python requirements that are needed.

Always pin exact versions for your Python dependencies. The ML/AI space moves fast, so you want to have an up-to-date version of each package while also being protected from breaking changes.

config.yaml
requirements:
- Pillow==10.0.0
- pytesseract==0.3.10
- torch==2.0.1
- transformers==4.30.2
resources:
  cpu: "4"
  memory: 16Gi
  use_gpu: false
  accelerator: null
secrets: {}

The system_packages section is the other important bit here, you can add any package that’s available via apt on Debian.

config.yaml
system_packages:
- tesseract-ocr

Deploy the model

$ truss push

You can then invoke the model with:

$ truss predict -d '{"url": "https://templates.invoicehome.com/invoice-template-us-neat-750px.png", "prompt": "What is the invoice number?"}'