In this video, I show you how to accelerate Transformer training with AWS Trainium, a new custom chip designed by AWS.
First, I walk you through the setup of an Amazon EC2 trn1.32xlarge instance, equipped with 16 Trainium chips. Then, I run a natural language processing job where I adapt existing Transformer training code for Trainium, accelerating a BERT model to classify the Yelp restaurant review datatset. Finally, I run the job on 1, 8, and 32 Neuron cores.
- AWS Trainium: https://aws.amazon.com/ec2/instance-types/trn1/
- AWS Neuron SDK documentation: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/frameworks/torch/index.html
- AWS Neuron SDK samples: https://github.com/aws-neuron/aws-neuron-samples
- Hugging Face tutorial: https://huggingface.co/docs/transformers/training
- Setup steps and code: https://gitlab.com/juliensimon/huggingface-demos/-/tree/main/trainium