In this video, I walk you through the simple process of deploying a Llama 3 8B model with Amazon SageMaker.
I use the latest version of the Text Generation Inference containers (TGI 2.0), and show you how to run synchronous inference and streaming inference.