If your model depends on internal packages that are not available on PyPi, you can bundle this code with the Truss in the
packagesfolder. These packages then become available as top level modules to the model class's code when it executes, whether it is running locally, in any Docker container, or on Baseten.
When you import your Truss, the import mechanism add the packages in the Truss'
packagesdirectory to the path.
Great care should be taken to avoid conflicts between these packages with any python requirements. Note that the serving environment may itself be dependent on some standard Python modules such as
kserve, so it's best to avoid using package names that may conflict with any standard or popular Python packages.
There are situations where you may need to maintain code separately from your Truss. For example, you may want to share code across multiple Trusses without publishing it to PyPi. External packages are built for this use case.
A Truss can be provided a set of external folders as places to look up local packages, using the
config.yaml. This settings is similar to
bundled_packages_direxcept that these folders can be outside of the Truss folder. These external directories are then effectively available to the model and training python code to lookup modules and packages from.
A Truss that has any external packages is considered
scattered, because the Truss is not self contained. Correspondingly,
gathermeans collecting into a self contained Truss.
TrussHandlehas a method called
gatherto gather a scattered Truss.
Note that a scattered Truss will likely not work on another machine, unless all the
external_package_dirsare carefully replicated there as well. For example, this will work for Git repos that contain both the Truss and the
external_package_dirsand relative paths are used. Where possible, a scattered Truss should be gathered into an equivalent Truss using the
gatheroperation, before sending over.
For in-memory serving and training, i.e. without creating and running a Docker image, these external directories are added to sys.path and thus become available to model/training code.
For any Docker based execution, a gathered Truss is first created and then executed. This gathered Truss works behind the scenes and the user doesn't need to touch it directly. The gathered Truss is cached and reused, and only recreated if any part of the original Truss is modified. This flow will likely be optimized in future to only update the changed parts, rather than recreate.